What does it mean % Cpu Utilization in application runtime statistics? - performance

When I run an application and, at the same time, I use a runtime evaluator in order to profile my program, I get, at the end of the process, many statistics. One of this is the Cpu utilization time.
Typically what is the Cpu utilization time? Well you might tell me the percantage calculated dividing the global time that process spent in cpu by the overall simulation time. Well, unfortunately my statistics are very deep and the program I am using is very precise and gets me a chart about the cpu utilization time.
So in my chart I have on x axis the time and on y axis the cpu utilization time in %.
So something like this:
cpu%
^
|
|
| *
| * * *
| * * *
| * * *
| * * *
| * * *
|* *
-------------------------------------------------> time
So, what does it mean? How should I interpret the following sentence?
"The cpu utilization percentage for
process 'MyProcess' at time '5.23 s'
is 12%"

It's the percent of the CPU's cycles being spent on your process. If your process was using 100% of the CPU's possible time it would be 100%. At 12% it's probably waiting on I/O or something like that. See http://en.wikipedia.org/wiki/CPU_usage.

Related

What is the "spool time" of Intel Turbo Boost?

Just like a turbo engine has "turbo lag" due to the time it takes for the turbo to spool up, I'm curious what is the "turbo lag" in Intel processors.
For instance, the i9-8950HK in my MacBook Pro 15" 2018 (running macOS Catalina 10.15.7) usually sits around 1.3 GHz when idle, but when I run a CPU-intensive program, the CPU frequency shoots up to, say 4.3 GHz or so (initially). The question is: how long does it take to go from 1.3 to 4.3 GHz? 1 microsecond? 1 milisecond? 100 miliseconds?
I'm not even sure this is up to the hardware or the operating system.
This is in the context of benchhmarking some CPU-intensive code which takes a few 10s of miliseconds to run. The thing is, right before this piece of CPU-intensive code is run, the CPU is essentially idle (and thus the clock speed will drop down to say 1.3 GHz). I'm wondering what slice of my benchmark is running at 1.3 GHz and what is running at 4.3 GHz: 1%/99%? 10%/90%? 50%/50%? Or even worse?
Depending on the answer, I'm thinking it would make sense to run some CPU-intensive code prior to starting the benchmark as a way to "spool up" TurboBoost. And this leads to another question: for how long should I run this "spooling-up" code? Probably one second is enough, but what if I'm trying to minimize this -- what's a safe amount of time for "spooling-up" code to run, to make sure the CPU will run the main code at the maximum frequency from the very first instruction executed?
Evaluation of CPU frequency transition latency paper presents transition latencies of various Intel processors. In brief, the latency depends on the state in which the core currently is, and what is the target state. For an evaluated Ivy Bridge processor (i7-3770 # 3.4 GHz) the latencies varied from 23 (1.6 GH -> 1.7 GHz) to 52 (2.0 GHz -> 3.4 GHz) micro-seconds.
At Hot Chips 2020 conference a major transition latency improvement of the future Ice Lake processor has been presented, which should have major impact mostly at partially vectorised code which uses AVX-512 instructions. While these instructions do not support as high frequencies as SSE or AVX-2 instructions, using an island of these instructions cause down- and following up-scaling of the processor frequency.
Pre-heating a processor obviously makes sense, as well as "pre-heating" memory. One second of a prior workload is enough to reach the highest available turbo frequency, however you should take into account also temperature of the processor, which may down-scale the frequency (actually CPU core and uncore frequencies if speaking about one of the latest Intel processors). You are not able to reach the temperature limit in a second. But it depends, what you want to measure by your benchmark, and if you want to take into account the temperature limit. When speaking about temperature limit, be aware that your processor also has a power limit, which is another possible reason for down-scaling the frequency during the application run.
Another think that you should take into account when benchmarking your code is that its runtime is very short. Be aware of the runtime/resources consumption measurement reliability. I would suggest an artificially extending the runtime (run the code 10 times and measure the overall consumption) for better results.
I wrote some code to check this, with the aid of the Intel Power Gadget API. It sleeps for one second (so the CPU goes back to its slowest speed), measures the clock speed, runs some code for a given amount of time, then measures the clock speed again.
I only tried this on my 2018 15" MacBook Pro (i9-8950HK CPU) running macOS Catalina 10.15.7. The specific CPU-intensive code being run between clock speed measurements may also influence the result (is it integer only? FP? SSE? AVX? AVX-512?), so don't take these as exact numbers, but only order-of-magnitude/ballpark figures. I have no idea how the results translate into different hardware/OS/code combinations.
The minimum clock speed when idle in my configuration is 1.3 GHz. Here's the results I obtained in tabular form.
+--------+-------------+
| T (ms) | Final clock |
| | speed (GHz) |
+--------+-------------+
| <1 | 1.3 |
| 1..3 | 2.0 |
| 4..7 | 2.5 |
| 8..10 | 2.9 |
| 10..20 | 3.0 |
| 25 | 3.0-3.1 |
| 35 | 3.3-3.5 |
| 45 | 3.5-3.7 |
| 55 | 4.0-4.2 |
| 66 | 4.6-4.7 |
+--------+-------------+
So 1 ms appears to be the minimum amount of time to get any kind of change. 10 ms gets the CPU to its nominal frequency, and from then on it's a bit slower, apparently over 50 ms to reach maximum turbo frequencies.

How can i calculate for the estimated completion time of both process

A certain computer system runs in a multi-programming environment using a non-preemptive
algorithm. In this system, two processes A and B are stored in the process queue,
and A has a higher priority than B. The table below shows estimated execution time for each
process; for example, process A uses CPU, I/O, and then CPU sequentially for 30, 60, and 30
milliseconds respectively. Which of the following is the estimated time in milliseconds
to complete both A and B? Here, the multi-processing overhead of OS is negligibly
small. In addition, both CPU and I/O operations can be executed concurrently, but I/O
operations for A and B cannot be performed in parallel.
UNIT : millisecond
CPU I/O CPU
A_______________30___________________60_________________30
B_______________45___________________45__________________--
Please help me.. i need to explain this in front of the class tomorrow but i cant seem get the idea of it...
A has the highest priority, but since the system is non-preemptive, this is only a tiebreaker when both processes need a resource at the same time.
At t=0, A gets the CPU for 30 ms, B waits as it needs the CPU.
At t=30, A releases the CPU, B gets the CPU for 45 ms, while A gets the I/O for 60 ms.
At t=75, the CPU sits idle as B is waiting for A to finish I/O, and A is not ready to use the CPU.
At t=90, A releases I/O and gets the CPU for another 30 ms, while B gets the I/O for 45 ms.
At t=120, A releases the CPU and is finished.
At t=135, B releases I/O and is finished.
It takes the longest path:
Non-preemptive multitasking or cooperative multitasking means that the process is kind of sharing a.e. the CPU time. In the worst case they use the worst time to achieve theire task.
CPU:
B = 45 is longer than A=30
45 +
I/O
A = 60 and B = 45
45 + 60
CPU again:
A = 30
45 + 60 + 30 = 135
i will explain in brief and please elaborate for your classroom discussion:
For your answer :135
when Process A waits for the I/O task,the CPU time will be given to Process B. so the complete time for process A and B would be
Process A (CPU )+ Process A I/O and Process B CPU + Process B I/O
30+60+45 = 135 ms

How to interpret CPU time vs CPU percentage

When I check azure monitoring tool, CPU usages are shown in CPU time
min: 4.69s
max: 2008.08 s
avg : 207.63 s
I am familiar with CPU% which makes sense as in application requiring cpu cycles.
how does the above time correspond to percentage?
What would be the max in seconds which corresponds to 70 or 100% cpu usage?
note : cpu is 4 cores
On a different instance, I noticed in a 60 second window
min: 0
max : 133.83
avg : 19.61
Based on below answers (see Nachiket's explanation in comments as well)
133.83 is a product of cpu time multiplied by cores ( in my case 4 cores)
Cpu utilization in this case is 133.83/(60*4) = 54.1%
Some cloud monitoring tools give resource usage in standard time measures. (seconds, hours, days etc.)
If you have usage in seconds like,
min: 4.69s
max: 2008.08 s
avg : 207.63 s
Then you can find out usage in % from above using definition of %.
% utilization = (resource used time / total resource availability time)
ex: if cpu was available for 100 seconds and out of that 80 seconds it was used then
% utilization = 80/100 = 80% CPU utilization
From your given time, total available time is missing. Find that out and use above formula.
% utilization = avg. usage/total availability
no. of cores shouldn't matter as that is present in both cases.
% utilization = ( (no. of cores * avg util)/(no. of core * total availability))
I am not sure about azure cloud monitoring but if it is providing same then you can use it.

Pipeline processor vs. Single-cycle processor

I have to compare the speed of execution of the following code (see picture) using DLX-pipeline and single-cycle processor.
Given:
an instruction in the single-cycle model takes 800 ps
a stage in the pipeline model takes 200 ps (based on MA)
My approach was as follows.
CPU time = CPI * CC * IC
Single-cycle:
CPU time = 1 * 800 ps * 10 instr. = 8000 ps.
Pipeline:
CPI = 21 cycles / 10 instr. = 2.1 cycles per instruction
CPU time = 2.1 * 200 ps * 10 = 4200 ps.
CPU time single-cycle / CPU time pipeline = 8000/4200 = 1.9, so the pipeline code runs 1.9 faster.
But I was said, I have to work with clock cycles and not with the time -- "It doesn't matter how much time a CC takes".
I don't see how to make a comparison otherwise. Could you please help me?
Your analysis is indeed correct, but I guess your professor is looking for an explanation like this:
Suppose the single cycle processor also has the stages that you have mentioned, namely IF, ID, EX, MA and WB and that the instruction spends roughly the same time in each stage as compared to the pipelined processor version. Now you can draw a pipeline diagram for this single cycle processor, and see that it would take 50 cycles on a single cycle processor (which can work on 1 instruction at a time) compared to the 19 cycles on a pipelined processor.
Again, I prefer the way you have analyzed it (as the single cycle processor wouldn't really have each of those stages in a different clock cycle, it would just have a very long clock cycle to cover all the stages). Also, you've not mentioned whether this is a stalling-only MIPS pipeline (for which your answer is correct) or if this is a bypassed-MIPS pipeline. If this is the latter, you can shave off a few more cycles and get it down to 15 cycles.

Getting cpu usage and calculating % used

I need to calculate the cpu usage and aggregate it from proc file in linux
/proc/stat gives me data but how would i come to know the % used of cpu at time as
stat gives me the count of processes at cores running at any time which does not give me any idea of %use of cpu?
And i am coding this in Golang and have to do this w/o scripts
Thanks in advance!!
/proc/stat does not only give you the count of processes on each core. man proc will tell you the exact format of that file. Copied from it, here is the part you should be interested in:
/proc/stat
cpu 3357 0 4313 1362393
The amount of time, measured in units of USER_HZ
(1/100ths of a second on most architectures, use
sysconf(_SC_CLK_TCK) to obtain the right value), that the
system spent in user mode, user mode with low priority
(nice), system mode, and the idle task, respectively.
The last value should be USER_HZ times the second entry
in the uptime pseudo-file.
It is then easy to do the substraction of the idle field between two measures, which will give you the time spent not doing anything by this CPU. The other value that you can extract is the time doing something, which is the difference between two measures of:
time in user mode + time spent in user mode with low priority + time spent in system mode
You will then have two values; one, A, is expressing the time doing nothing, and the other, B, the time actually doing something. B / (A + B) will give you the percentage of time the CPU was busy.

Resources