Sysbench:which value should I look at for CPU benchmarking? - performance

sysbench version: 1.0.7
OS: macOS 10.11.6
No matter where I ran sysbench cpu run I get very similar results like the following.
sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Prime numbers limit: 10000
Initializing worker threads...
Threads started!
General statistics:
total time: 10.0005s
total number of events: 9083
Latency (ms):
min: 0.96
avg: 1.10
max: 7.18
95th percentile: 1.34
sum: 9995.18
Threads fairness:
events (avg/stddev): 9083.0000/0.00
execution time (avg/stddev): 9.9952/0.00
I read some blog posts and all say that I should look at total time but it's always 10 sec in different platform/env. I also get the very similar result with very small prime number list e.g. --cpu-max-prime=100. I also run with --time=0 and the benchmark never finishes.
My guess is the total time matches the value specified with --time option but then I don't know what's the right command to use.
Thanks in advance

Take --cpu-max-prime big enough, for example let it be 20000. As for sysbench, it looks like it always run 10 sec now, so you should see "total number of events" value (high value means better performance). To get correct total performance of your server CPU, you should put --num-threads=[number of hyperthreads from cpuinfo] too.

You can set --max-time to indicate a different max time (in seconds) for the test. Default is 10 seconds.
You can see the full manual here: http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf

Always read man sysbench for different parameters you needed.
Use example from bottom, to have a perspective of cpu performance
sysbench --time=60 --resoults-interval=10 --threads=8 cpu run

Related

Docker Container CPU usage Monitoring

As per the documentation of docker.
We can get CPU usage of docker container with docker stats command.
The column CPU % will give the percentage of the host’s CPU the container is using.
Let say I limit the container to use 50% of hosts single CPU. I can specify 50% single CPU core limit by --cpus=0.5 option as per https://docs.docker.com/config/containers/resource_constraints/
How can we get the CPU% usage of container out of allowed CPU core by any docker command?
E.g. Out of 50% Single CPU core, 99% is used.
Is there any way to get it with cadvisor or prometheus?
How can we get the CPU% usage of container out of allowed CPU core by any docker command? E.g. Out of 50% Single CPU core, 99% is used.
Docker has docker stats command which shows CPU/Memory usage and few other stats:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
c43f085dea8c foo_test.1.l5haec5oyr36qdjkv82w9q32r 0.00% 11.15MiB / 100MiB 11.15% 7.45kB / 0B 3.29MB / 8.19kB 9
Though it does show memory usage regarding the limit out of the box, there is no such feature for CPU yet. It is possible to solve that with a script that will calculate the value on the fly, but I'd rather chosen the second option.
Is there any way to get it with cadvisor or prometheus?
Yes, there is:
irate(container_cpu_usage_seconds_total{cpu="total"}[1m])
/ ignoring(cpu)
(container_spec_cpu_quota/container_spec_cpu_period)
The first line is a typical irate function that calculates how much of CPU seconds a container has used. It comes with a label cpu="total", which the second part does not have, and that's why there is ignoring(cpu).
The bottom line calculates how many CPU cores a container is allowed to use. There are two metrics:
container_spec_cpu_quota - the actual quota value. The value is computed of a fraction of CPU cores that you've set as the limit and multiplied by container_spec_cpu_period.
container_spec_cpu_period - comes from CFS Scheduler and it is like a unit of the quota value.
I know it may be hard to grasp at first, allow me to explain on an example:
Consider that you have container_spec_cpu_period set to the default value, which is 100,000 microseconds, and container CPU limit is set to half a core (0.5). In this case:
container_spec_cpu_period 100,000
container_spec_cpu_quota 50,000 # =container_spec_cpu_period*0.5
With CPU limit set to two cores you will have this:
container_spec_cpu_quota 200,000
And so by dividing one by another we get the fraction of CPU cores back, which is then used in another division to calculate how much of the limit is used.

How to interpret cpu profiling graph

I was following the go blog here
I tried to profile my program but it looks a bit different. (Seems that go has moved from sampling to instrumentation?)
I wonder what these numbers mean
Especially showing nodes accounting for 2.59s, 92.5% of 2.8
What does total sample = 2.8s mean? The sample is drawn in an interval of 2.8 seconds?
Does it mean that only nodes that are running over 92.5% of sample
time are shown?
Also I wonder these numbers are generated. In the original go blog, the measure is how many times the function is detected in execution among all samples. However, we are dealing with seconds here. How does go profiling tool know how many seconds a function call takes.
Any help will be appreciated
Think of the graph as a graph of a resource, time. You'll start at the top with, for example, 10 seconds. Then you'll see that 5 seconds went to time.Sleep and 5 went to encoding/json. The particular divides in that time is represented by the arrows, so they show that 5 went to each part of the program. So now we have 3 nodes, the first node 10 seconds, time.Sleep 5 seconds, and encoding/json 5 seconds. Then those 5 seconds in encoding/json are broken down even further into the functions that took up most the time. The 0.01s (percentage) out of 0.02s (larger percentage) means that this function took 0.01s of processing time out of a total of 0.02s of the block of time (the arrow with the number) total by this particular call stack. The percentage represents the overall percentage of execution time this part took up from the whole pie. So you'll see that encoding/json string/encoder took 0.36 percent of the total execution time/resources of your program.

Golang - What is the meaning of the seconds in CPU profiling graph?

For example, the data in the figure runtime.scanobject:
13.42s
runtime.scanobject 9.69s(4.51%) of 18.30s(8.52%).
5.33s
what is the meaning of the seconds and percent?
Thanks.
When CPU profiling is enabled, the Go program stops about 100 times per second and records a sample consisting of the program counters on the currently executing goroutine's stack.
That time and percentange is in reference to the sample.
Here is a nice reference for you to read more about it: https://blog.golang.org/profiling-go-programs

What is the performance of 10 processors capable of 200 MFLOPs running code which is 10% sequential and 90% parallelelizable?

simple problem from Wilkinson and Allen's Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Working through the exercises at the end of the first chapter and want to make sure that I'm on the right track. The full question is:
1-11 A multiprocessor consists of 10 processors, each capable of a peak execution rate of 200 MFLOPs (millions of floating point operations per second). What is the performance of the system as measured in MFLOPs when 10% of the code is sequential and 90% is parallelizable?
I assume the question wants me to find the number of operations per second of a serial processor which would take the same amount of time to run the program as the multiprocessor.
I think I'm right in thinking that 10% of the program is run at 200 MFLOPs, and 90% is run at 2,000 MFLOPs, and that I can average these speeds to find the performance of the multiprocessor in MFLOPs:
1/10 * 200 + 9/10 * 2000 = 1820 MFLOPs
So when running a program which is 10% serial and 90% parallelizable the performance of the multiprocessor is 1820 MFLOPs.
Is my approach correct?
ps: I understand that this isn't exactly how this would work in reality because it's far more complex, but I would like to know if I'm grasping the concepts.
Your calculation would be fine if 90% of the time, all 10 processors were fully utilized, and 10% of the time, just 1 processor was in use. However, I don't think that is a reasonable interpretation of the problem. I think it is more reasonable to assume that if a single processor were used, 10% of its computations would be on the sequential part, and 90% of its computations would be on the parallelizable part.
One possibility is that the sequential part and parallelizable parts can be run in parallel. Then one processor could run the sequential part, and the other 9 processors could do the parallelizable part. All processors would be fully used, and the result would be 2000 MFLOPS.
Another possibility is that the sequential part needs to be run first, and then the parallelizable part. If a single processor needed 1 hour to do the first part, and 9 hours to do the second, then it would take 10 processors 1 + 0.9 = 1.9 hours total, for an average of about (1*200 + 0.9*2000)/1.9 ~ 1053 MFLOPS.

How is CPU time measured on Windows?

I am currently creating a program which identifies processes which are hung/out-of-control, and using an entire CPU core. The program then terminates them, so the CPU usage can be kept under control.
However, I have run into a problem: When I execute the 'tasklist' command on Windows, it outputs this:
Image Name: Blockland.exe
PID: 4880
Session Name: Console
Session#: 6
Mem Usage: 127,544 K
Status: Running
User Name: [removed]\[removed]
CPU Time: 0:00:22
Window Title: C:\HammerHost\Blockland\Blockland.exe
So I know that the line which says "CPU Time" is an indication of the total time, in seconds, used by the program ever since it started.
But let's suppose there are 4 CPU cores on the system. Does this mean that it used up 22 seconds of one core, and therefore used 5.5 seconds on the entire CPU in total? Or does this mean that the process used up 22 seconds on the entire CPU?
It's the total CPU time across all cores. So, if the task used 10 seconds on one core and then 15 seconds later on a different core it would report 25 seconds. If it used 5 seconds on all four cores simultaneously, it would report 20 seconds.

Resources