User CPU time vs System CPU time? - cpu

Could you explain more about "user CPU time" and "system CPU time"? I have read a lot, but I couldn't understand it well.

The difference is whether the time is spent in user space or kernel space. User CPU time is time spent on the processor running your program's code (or code in libraries); system CPU time is the time spent running code in the operating system kernel on behalf of your program.

User CPU Time: Amount of time the processor worked on the specific program.
System CPU Time: Amount of time the processor worked on operating system's functions connected to that specific program.

The term ‘user CPU time’ can be a bit misleading at first. To be clear, the total time (real CPU time) is the combination of the amount of time the CPU spends performing some action for a program and the amount of time the CPU spends performing system calls for the kernel on the program’s behalf. When a program loops through an array, it is accumulating user CPU time. Conversely, when a program executes a system call such as exec or fork, it is accumulating system CPU time.

Based on wikipedia:
User time is the amount of time the CPU was busy executing code in user space.
System time is the amount of time the CPU was busy executing code in kernel space. If this value is reported for a thread or
process, then it represents the amount of time the kernel was doing
work on behalf of the executing context, for example, after a thread
issued a system call.

Related

Flame graph(perf record) cannot display accurate CPU idle usage

When the CPU usage is 60%, the flame graphs(perf record) is used to capture the CPU usage. Why is 40% idle-related stack usage not displayed in the flame graphs? The usage of the idle stack is often less than 5%.
For flame graphs, the point is normally to measure where a process spends CPU time while it's running, not which blocking functions it calls that make it sleep, or where it gets scheduled out and sleeps when it doesn't want to.
I capture performance for one cpu processor, not one process. According to the operating system design, if there is no active task on the CPU, the CPU calls an idle waiting function. For example, Linux often calls schedule_idle until it is interrupted by a new task. Therefore, it is expected that the schedule_idle can be found in flame gragh and it consumes 40% of the cpu usage.
Perf events like cycles don't increment when the clock is halted (e.g. cycles is cpu_clk_unhalted.thread_p or similar). If you really wanted to see time spend idle, you might be able to disable idle power saving to get Linux to just spin in a loop instead of using x86 monitor/mwait or even basic hlt to put the CPU into a C-state where the clock doesn't tick.
Or run your code pinned to one logical core, and on the other logical core, pin a task that runs the pause instruction in a loop. So the physical core's clock keeps ticking for the core you're counting events for.
You should still get counts for cpu_clk_unhalted.thread_any ([Core cycles when at least one thread on the physical core is not in halt state]) when recording that event on the logical core with your task, even when that logical core is asleep.
And you can also record counts for cpu_clk_unhalted.thread to count cycles when this (hardware) thread aka logical core isn't halted, to know how much CPU time you actually used. (Or use the software event task-clock for that.)
Use perf list to see events available on your CPU, and read their descriptions carefully.

Intentionally high CPU usage, GCD, QOS_CLASS_BACKGROUND, and spindump

I am developing a program that happens to use a lot of CPU cycles to do its job. I have noticed that it, and other CPU intensive tasks, like iMovie import/export or Grapher Examples, will trigger a spin dump report, logged in Console:
1/21/16 12:37:30.000 PM kernel[0]: process iMovie[697] thread 22740 caught burning CPU! It used more than 50% CPU (Actual recent usage: 77%) over 180 seconds. thread lifetime cpu usage 91.400140 seconds, (87.318264 user, 4.081876 system) ledger info: balance: 90006145252 credit: 90006145252 debit: 0 limit: 90000000000 (50%) period: 180000000000 time since last refill (ns): 116147448571
1/21/16 12:37:30.881 PM com.apple.xpc.launchd[1]: (com.apple.ReportCrash[705]) Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.ReportCrash
1/21/16 12:37:30.883 PM ReportCrash[705]: Invoking spindump for pid=697 thread=22740 percent_cpu=77 duration=117 because of excessive cpu utilization
1/21/16 12:37:35.199 PM spindump[423]: Saved cpu_resource.diag report for iMovie version 9.0.4 (1634) to /Library/Logs/DiagnosticReports/iMovie_2016-01-21-123735_cudrnaks-MacBook-Pro.cpu_resource.diag
I understand that high CPU usage may be associated with software errors, but some operations simply require high CPU usage. It seems a waste of resources to watch-dog and report processes/threads that are expected to use a lot of CPU.
In my program, I use four serial GCD dispatch queues, one for each core of the i7 processor. I have tried using QOS_CLASS_BACKGROUND, and spin dump recognizes this:
Primary state: 31 samples Non-Frontmost App, Non-Suppressed, Kernel mode, Thread QoS Background
The fan spins much more slowly when using QOS_CLASS_BACKGROUND instead of QOS_CLASS_USER_INITIATED, and the program takes about 2x longer to complete. As a side issue, Activity Monitor still reports the same % CPU usage and even longer total CPU Time for the same task.
Based on Apple's Energy Efficiency documentation, QOS_CLASS_BACKGROUND seems to be the proper choice for something that takes a long time to complete:
Work takes significant time, such as minutes or hours.
So why then does it still complain about using a lot of CPU time? I've read about methods to disable spindump, but these methods disable it for all processes. Is there a programmatic way to tell the system that this process/thread is expected to use a lot of CPU, so don't bother watch-dogging it?

What exactly is CPU load if instructions are executed one at a time?

I know this question has been asked many times in many different manners, but it's still not clear for me what the CPU load % means.
I'll start explaining how I perceive the concepts now (of course, I might, and sure will, be wrong):
A CPU core can only execute one instruction at a time. It will not execute the next instruction until it finishes executing the current one.
Suppose your box has one single CPU with one single core. Parallel computing is hence not possible. Your OS's scheduler will pick up a process, set the IP to the entry point, and send that instruction to the CPU. It won't move to the next instruction until the CPU finishes executing the current instruction. After a certain amount of time it will switch to another process, and so on. But it will never switch to another process if the CPU is currently executing an instruction. It will wait until the CPU becomes free to switch to another process. Since you only have one single core, you can't have two processes executing simultaneously.
I/O is expensive. Whenever a process wants to read a file from the disk, it has to wait until the disk accomplishes its task, and the current process can't execute its next instruction until then. The CPU is not doing anything while the disk is working, and so our OS will switch to another process until the disk finishes its job in order not to waste time.
Following these principles, I've come myself to the conclusion that CPU load at a given time can only be one of the following two values:
0% - Idle. CPU is doing nothing at all.
100% - Busy. CPU is currently executing an instruction.
This is obviously false as taskmgr reports %1, 12%, 15%, 50%, etc. CPU usage values.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core (as reported by taskmgr)? While that given process is executing, what happens with the 99%?
What does it mean that the overall CPU usage is 19% (as reported by Rainmeter at the moment)?
If you look into the task manager on Windows there is Idle process, that does exactly that, it just shows amount of cycles not doing anything useful. Yes, CPU is always busy, but it might be just running in a loop waiting for useful things to come.
Since you only have one single core, you can't have two processes
executing simultaneously.
This is not really true. Yes, true parallelism is not possible with single core, but you can create illusion of one with preemptive multitasking. Yes, it is impossible to interrupt instruction, but it is not a problem because most of the instructions require tiny amount of time to finish. OS shares time with time slices, which are significantly longer than execution time of single instruction.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core
Most of the time applications are not doing anything useful. Think of application that waits for user to click a button to start processing something. This app doesn't need CPU, so it sleeps most of the time, or every time it gets time slice it just goes into sleep (see event loop in Windows). GetMessage is blocking, so it means that thread will sleep until message arrives. So what CPU load really means? So imagine the app receives some events or data to do things, it will do operations instead of sleeping. So if it utilizes X% of CPU means that over sampling period of time that app used X% of CPU time. CPU time usage is average metric.
PS: To summarize concept of CPU load, think of speed (in terms of physics). There are instantaneous and average speeds, so speaking of CPU load, there also are instantaneous and average measurements. Instantaneous is always equal to either 0% or 100%, because at some point of time process either uses CPU or not. If process used 100% of CPU in the course of 250ms and didn't use for next 750ms then we can say that process loaded CPU for 25% with sampling period of 1 second (average measurement can only be applied with certain sampling period).
http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.
This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU. Unix refers to this as the run-queue length: the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.
Also see: http://en.wikipedia.org/wiki/Load_(computing)

Which one will workload(usage) of the CPU-Core if there is a persistent cache-miss, will be 100%?

That is, if the core processor most of the time waiting for data from RAM or cache-L3 with cache-miss, but the system is a real-time (real-time thread priority), and the thread is attached (affinity) to the core and works without switching thread/context, what kind of load(usage) CPU-Core should show on modern x86_64?
That is, CPU usage is displayed as decrease only when logged in Idle?
And if anyone knows, if the behavior is different in this case for other processors: ARM, Power[PC], Sparc?
Clarification: shows CPU-usage in standard Task manager in OS-Windows
A hardware thread (logical core) that's stalled on a cache miss can't be doing anything else, so it still counts as busy for the purposes of task-managers / CPU time accounting / OS process scheduler time-slices / stuff like that.
This is true across all architectures.
Without hyperthreading, "hardware thread" / "logical core" are the same as a "physical core".
Morphcore / other on-the-fly changing between hyperthreading and a more powerful single core could make there be a difference between a thread that keeps many execution units busy, vs. a thread that is blocked on cache misses a lot of the time.
I don't get the link between the OS CPU usage statistics and the optimal use of the pipeline. I think they are uncorrelated as the OS doesn't measure the pipeline load.
I'm writing this in the hope that Peter Cordes can help me understand it better and as a continuation of the comments.
User programs relinquish control to OS very often: when they need input from user or when
they are done with the signal/message. GUI program are basically just big loops and at
each iteration control is given to the OS until the next message.
When the OS has the control it schedules others threads/tasks and if not other actions
are needed just enter the idle process (long time ago a tight loop, now a sleep state)
until the next interrupt. This is the Idle Time.
Time spent on an ISR processing user input is considered idle time by any OS.
An a cache miss there would be still considered idle time.
A heavy program takes more time to complete the work for a given message thereby returning
control to OS say 2 times in a second instead of
20.
If the OS measures that in the last second, it got control for 20ms only then the
CPU usage is (1000-20)/1000 = 98%.
This has nothing to do with the optimal use of the CPU architecture, as said stalls can
occur in the OS code and still be part of the Idle time statistic.
The CPU utilization at pipeline level is not what is measured and it is orthogonal to the
OS statistics.
CPU usage is meant to be used by sysadmin, it is a measure of the load you put on a system,
it is not the measure of how efficiently the assembly of a program was generated.
Sysadmins can't help with that, but measuring how often the OS got the control back (without
preempting) is a measure of how much load a program is putting on the system.
And sysadmins can definitively do terminate heavy programs.

System performance analysis

I am trying to understand how performance can be measured using the time command in Unix systems. Lets say I run the time command for three different machines and get the following results:
A: 282u(user cpu time) 3S(system cpu time) 4:45(elapsed time) 99%
B: 238u 5S 4:13 98%
C: 302u 9S 5.11 97%
Which system will have the highest performance?
man time tells, user time is how long your program spent on CPU and System time is time spent in kernel performing privilege operations such as I/O calls read, write behalf of your program. Therefore User + System time in case of machine A is smaller compared to other machines, resulting best performance out of all three machines.
Elapsed time is the time measured by wall clock i.e time measured from process spawned and it terminates. Though it is nothing to do with CPU usage.

Resources