I ran the following Java code on a Linux server:
while (true) {
int a = 1+2;
}
It caused one CPU core to reach 100% usage. I'm confused about this, because I learnt that CPUs deal with tasks by time splitting, meaning that the CPU will do one task in one time slot (CPU time range scheduler). If there are 10 time slots, the while true task should have at most use 10% CPU usage, because the other 90% would be assigned to other tasks. So why is it 100%?
If your CPU usage is not at 100%, then the process can use as much as it wants (up to 100%), until other processes are requesting the use of the resource. The process scheduler attempts to maximize CPU usage, and never leave a process starved of CPU time if there are no other processes that need it.
So your while loop will use 100% of available idle CPU resources, and will only begin to use less when other CPU intensive processes are started up. (if you're using Linux/Unix, you could observe this with top by starting up your while loop, then starting another CPU intensive process and watch the % CPU drop for the process with the loop).
In a multitasking OS the CPU time is split between execution flows (processes, threads) - that's true. So what happens here - the loop is being executed until some point when clock interruption occurs which is used by OS to schedule next execution "piece" from other process or thread. But as soon as your code doesn't reach specific points (input/output or other system calls which can switch the process into the "waiting" state, such as waiting for a synchronization objects sleep operations, etc.) the process keeps being in the "running" state which tells scheduler to keep it in the execution queue and reschedule its execution at the first best opportunity. If there are some "competitive" processes which keep their "running" state for long period of time as well the CPU usage will be shared between your and all these processes, otherwise your process execution will rescheduled immediately which will cause continuous high CPU consumption anyway.
Related
When the CPU usage is 60%, the flame graphs(perf record) is used to capture the CPU usage. Why is 40% idle-related stack usage not displayed in the flame graphs? The usage of the idle stack is often less than 5%.
For flame graphs, the point is normally to measure where a process spends CPU time while it's running, not which blocking functions it calls that make it sleep, or where it gets scheduled out and sleeps when it doesn't want to.
I capture performance for one cpu processor, not one process. According to the operating system design, if there is no active task on the CPU, the CPU calls an idle waiting function. For example, Linux often calls schedule_idle until it is interrupted by a new task. Therefore, it is expected that the schedule_idle can be found in flame gragh and it consumes 40% of the cpu usage.
Perf events like cycles don't increment when the clock is halted (e.g. cycles is cpu_clk_unhalted.thread_p or similar). If you really wanted to see time spend idle, you might be able to disable idle power saving to get Linux to just spin in a loop instead of using x86 monitor/mwait or even basic hlt to put the CPU into a C-state where the clock doesn't tick.
Or run your code pinned to one logical core, and on the other logical core, pin a task that runs the pause instruction in a loop. So the physical core's clock keeps ticking for the core you're counting events for.
You should still get counts for cpu_clk_unhalted.thread_any ([Core cycles when at least one thread on the physical core is not in halt state]) when recording that event on the logical core with your task, even when that logical core is asleep.
And you can also record counts for cpu_clk_unhalted.thread to count cycles when this (hardware) thread aka logical core isn't halted, to know how much CPU time you actually used. (Or use the software event task-clock for that.)
Use perf list to see events available on your CPU, and read their descriptions carefully.
As I know, for threads scheduling, Linux implements a fair scheduler and Windows implements the Round-robin (RR) schedulers: each thread has a time slice for its execution (correct me if I'm wrong).
I wonder, is the CPU usage related to the thread scheduling?
For example: there are 2 threads executing at the same time, and the time slice for system is 15ms. The cpu has only 1 core.
Thread A needs 10ms to finish the job and then sleep 5ms, run in a loop.
Thread B needs 5ms to finish the job and then sleep 10ms, also in a loop.
Will the CPU usage be 100%?
How is the thread scheduled? Will thread A use up all its time and then schedule out?
One More Scenario:
If I got a thread A running, that is then blocked by some condition (e.g network). Will the CPU at 100% affect the wakeup time of this thread? For example, a thread B may be running in this time window, will the thread A be preempted by the OS?
As i know that Linux implements a fair scheduler and Windows System
implements the Round-robin (RR) schedulers for threads scheduling,
Both Linux and Windows use priority-based, preemptive thread schedulers. Fairness matters but it's not, strictly speaking, the objective. Exactly how these scheduler work depends on the version and the flavor (client vs. server) of the system. Generally, thread schedulers are designed to maximize responsiveness and alleviate scheduling hazards such as inversion and starvation. Although some scheduling decisions are made in a round-robin fashion, there are situations in which the scheduler may insert the preempted thread at the front of the queue rather than at the back.
each thread has a time slice for its execution.
The time slice (or quantum) is really more like a guideline than a rule. It's important to understand that a time slice is divisible and it equals some variable number of clock cycles. The scheduler charges CPU usage in terms of clock cycles, not time slices. A thread may run for more than a time slice (e.g., a time slice and a half). A thread may also voluntarily relinquish the rest of its time slice. This is possible because the only way for a thread to relinquish its time slice is by performing a system call (sleep, yield, acquire lock, request synchronous I/O). All of these are privileged operations that cannot be performed in user-mode (otherwise, a thread can go to sleep without telling the OS!). The scheduler can change the state of the thread from "ready" to "waiting" and schedule some other ready thread to run. If a thread relinquishes the rest of its time slice, it will not be compensated the next time it is scheduled to run.
One particularly interesting case is when a hardware interrupt occurs while a thread is running. In this case, the processor will automatically switch to the interrupt handler, forcibly preempting the thread even if its time slice has not finished yet. In this case, the thread will not be charged for the time it takes to handle the interrupt. Note that the interrupt handler would be indeed utilizing the CPU. By the way, the overhead of context switching itself is also not charged towards any time slice. Moreover, on Windows, the fact that a thread is running in user-mode or kernel-mode by itself does not have an impact on its priority or time slice. On Linux, the scheduler is invoked at specific places in the kernel to avoid starvation (kernel preemption implemented in Linux 2.5+).
So the CPU usage will be 100%? And how is the thread scheduled? Will
thread A use up all its time and then schedule out?
It's easy to answer these questions now. When a thread goes to sleep, the other gets scheduled. Note that this happens even if the threads have different priorities.
If i got a thread running, and blocked by some
condition(e.g network). Will the CPU 100% will affect the wakeup time
of this thread? For example, another thread may running in its time
window and will not schedule out by the OS?
Linux and Windows schedulers implement techniques to enable threads that are waiting on I/O operations to "wake up quickly" and get higher chances of being scheduled soon. For example, on Windows, the priority of a thread waiting on an I/O operation may be boosted a bit when the I/O operation completes. This means that it can preempt another running thread before finishing its time slice, even if both threads had the same priorities initially. When a boosted-priority thread wakes up, its original priority is restored.
So the CPU usage will be 100%?
Ideally speaking, the answer would be yes and by ideally I mean , you are not considering the time wasted in doing performing a context switch. Practically , the CPU utilization is increased by keeping it busy all of the time but still there is some amount of time that is wasted in doing a context switch(the time it takes to switch from one process or thread to another).
But I would say that in your case the time constraints of both threads are aligned perfectly to have maximum CPU utilization.
And how is the thread scheduled? Will thread A use up all its time and
then schedule out?
Well it really depends, in most modern operating systems implementations , if there is another process in the ready queue, the current process is scheduled out as soon as it is done with CPU , regardless of whether it still has time quantum left. So yeah if you are considering a modern OS design then the thread A is scheduled out right after 10ms.
If I have 5 processes arrive at the CPU all at different times in a CPU Burst with shortest process next scheduling, would overhead only exist if, lets say process one finishes before the next process arrives? The overhead being the idletime of the CPU?
You should think about the number of cpus. If two processes are using the same cpu, they will compete and slow the overall finish time down. A cpu switching between different processes or threads can slow things down more than idle time. So I would keep it to one process at a time per CPU.
I know this question has been asked many times in many different manners, but it's still not clear for me what the CPU load % means.
I'll start explaining how I perceive the concepts now (of course, I might, and sure will, be wrong):
A CPU core can only execute one instruction at a time. It will not execute the next instruction until it finishes executing the current one.
Suppose your box has one single CPU with one single core. Parallel computing is hence not possible. Your OS's scheduler will pick up a process, set the IP to the entry point, and send that instruction to the CPU. It won't move to the next instruction until the CPU finishes executing the current instruction. After a certain amount of time it will switch to another process, and so on. But it will never switch to another process if the CPU is currently executing an instruction. It will wait until the CPU becomes free to switch to another process. Since you only have one single core, you can't have two processes executing simultaneously.
I/O is expensive. Whenever a process wants to read a file from the disk, it has to wait until the disk accomplishes its task, and the current process can't execute its next instruction until then. The CPU is not doing anything while the disk is working, and so our OS will switch to another process until the disk finishes its job in order not to waste time.
Following these principles, I've come myself to the conclusion that CPU load at a given time can only be one of the following two values:
0% - Idle. CPU is doing nothing at all.
100% - Busy. CPU is currently executing an instruction.
This is obviously false as taskmgr reports %1, 12%, 15%, 50%, etc. CPU usage values.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core (as reported by taskmgr)? While that given process is executing, what happens with the 99%?
What does it mean that the overall CPU usage is 19% (as reported by Rainmeter at the moment)?
If you look into the task manager on Windows there is Idle process, that does exactly that, it just shows amount of cycles not doing anything useful. Yes, CPU is always busy, but it might be just running in a loop waiting for useful things to come.
Since you only have one single core, you can't have two processes
executing simultaneously.
This is not really true. Yes, true parallelism is not possible with single core, but you can create illusion of one with preemptive multitasking. Yes, it is impossible to interrupt instruction, but it is not a problem because most of the instructions require tiny amount of time to finish. OS shares time with time slices, which are significantly longer than execution time of single instruction.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core
Most of the time applications are not doing anything useful. Think of application that waits for user to click a button to start processing something. This app doesn't need CPU, so it sleeps most of the time, or every time it gets time slice it just goes into sleep (see event loop in Windows). GetMessage is blocking, so it means that thread will sleep until message arrives. So what CPU load really means? So imagine the app receives some events or data to do things, it will do operations instead of sleeping. So if it utilizes X% of CPU means that over sampling period of time that app used X% of CPU time. CPU time usage is average metric.
PS: To summarize concept of CPU load, think of speed (in terms of physics). There are instantaneous and average speeds, so speaking of CPU load, there also are instantaneous and average measurements. Instantaneous is always equal to either 0% or 100%, because at some point of time process either uses CPU or not. If process used 100% of CPU in the course of 250ms and didn't use for next 750ms then we can say that process loaded CPU for 25% with sampling period of 1 second (average measurement can only be applied with certain sampling period).
http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.
This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU. Unix refers to this as the run-queue length: the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.
Also see: http://en.wikipedia.org/wiki/Load_(computing)
That is, if the core processor most of the time waiting for data from RAM or cache-L3 with cache-miss, but the system is a real-time (real-time thread priority), and the thread is attached (affinity) to the core and works without switching thread/context, what kind of load(usage) CPU-Core should show on modern x86_64?
That is, CPU usage is displayed as decrease only when logged in Idle?
And if anyone knows, if the behavior is different in this case for other processors: ARM, Power[PC], Sparc?
Clarification: shows CPU-usage in standard Task manager in OS-Windows
A hardware thread (logical core) that's stalled on a cache miss can't be doing anything else, so it still counts as busy for the purposes of task-managers / CPU time accounting / OS process scheduler time-slices / stuff like that.
This is true across all architectures.
Without hyperthreading, "hardware thread" / "logical core" are the same as a "physical core".
Morphcore / other on-the-fly changing between hyperthreading and a more powerful single core could make there be a difference between a thread that keeps many execution units busy, vs. a thread that is blocked on cache misses a lot of the time.
I don't get the link between the OS CPU usage statistics and the optimal use of the pipeline. I think they are uncorrelated as the OS doesn't measure the pipeline load.
I'm writing this in the hope that Peter Cordes can help me understand it better and as a continuation of the comments.
User programs relinquish control to OS very often: when they need input from user or when
they are done with the signal/message. GUI program are basically just big loops and at
each iteration control is given to the OS until the next message.
When the OS has the control it schedules others threads/tasks and if not other actions
are needed just enter the idle process (long time ago a tight loop, now a sleep state)
until the next interrupt. This is the Idle Time.
Time spent on an ISR processing user input is considered idle time by any OS.
An a cache miss there would be still considered idle time.
A heavy program takes more time to complete the work for a given message thereby returning
control to OS say 2 times in a second instead of
20.
If the OS measures that in the last second, it got control for 20ms only then the
CPU usage is (1000-20)/1000 = 98%.
This has nothing to do with the optimal use of the CPU architecture, as said stalls can
occur in the OS code and still be part of the Idle time statistic.
The CPU utilization at pipeline level is not what is measured and it is orthogonal to the
OS statistics.
CPU usage is meant to be used by sysadmin, it is a measure of the load you put on a system,
it is not the measure of how efficiently the assembly of a program was generated.
Sysadmins can't help with that, but measuring how often the OS got the control back (without
preempting) is a measure of how much load a program is putting on the system.
And sysadmins can definitively do terminate heavy programs.