Prometheus - Convert cpu_user_seconds to CPU Usage %? - performance

I'm monitoring docker containers via Prometheus.io. My problem is that I'm just getting cpu_user_seconds_total or cpu_system_seconds_total.
How to convert this ever-increasing value to a CPU percentage?
Currently I'm querying:
rate(container_cpu_user_seconds_total[30s])
But I don't think that it is quite correct (comparing to top).
How to convert cpu_user_seconds_total to CPU percentage? (Like in top)

Rate returns a per second value, so multiplying by 100 will give a percentage:
rate(container_cpu_user_seconds_total[30s]) * 100

I also found this way to get CPU Usage to be accurate:
100 - (avg by (instance) (irate(node_cpu_seconds_total{job="node",mode="idle"}[5m])) * 100)
From: http://www.robustperception.io/understanding-machine-cpu-usage/

Note that container_cpu_user_seconds_total and container_cpu_system_seconds_total are per-container counters, which show CPU time used by a particular container in user space and in kernel space accordingly (see these docs for more details). Cadvisor exposes additional metric - container_cpu_usage_seconds_total. This metric equals to the sum of the container_cpu_user_seconds_total and container_cpu_system_seconds_total, e.g. it shows overall CPU time used by each container. See these docs for more details.
The container_cpu_usage_seconds_total is a counter, e.g. it increases over time. This isn't very informative for determining CPU usage at a particular time. Prometheus provides rate() function, which returns the average per-second increase rate over counters. For example, the followign query returns the average per-second increase of per-container container_cpu_usage_seconds_total metrics over the last 5 minutes (see 5m lookbehind window in square brackets):
rate(container_cpu_usage_seconds_total[5m])
This is basically the average number of CPU cores used during the last 5 minutes. Just multiply it by 100 in order to get CPU usage in %. Note that the resulting value may exceed 100% if the container uses more than a single CPU core during the last 5 minutes.
The rate(container_cpu_usage_seconds_total[5m]) usually returns a TON of time series with many long labels in production Kubernetes, so it is better to use the following queries:
The average number of CPU cores used during the last 5 minutes per each pod:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)
The average number of CPU cores used during the last 5 minutes per each node:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (node)
The average number of CPU cores used during the last 5 minutes per each namespace:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (namespace)
The container!="" filter removes superfluous metrics related to cgroups hierarchy - see this answer for more details.

For Windows Users - wmi_exporter
100 - (avg by (instance) (irate(wmi_cpu_time_total{mode="idle"}[2m])) * 100)

Related

Unable to analyse Throughput Performance of One API which has been migrated to new platform

I have checking the performance one APIs which is performing in two systems therefore as the api has been migrated to new system i am doing the performance comparison from old system
Statistics as shown below:
 
New System:
 
Thread -25
Ramp-up ~25
Avg -8sec
Median - 7.8
95th percentile  -  8.8 sec
Throughput  - 0.39                  
Old System:
 
Thread -25
Ramp-up ~25
Avg -10 sec
Median - 10
95th percentile - 10
Throughput  - 0.74 
Here we can observe that the New System has taken less time for 25 Threads than old system but throughput is more Old System.
But Old System has Taken more time
I am confused about the throughput which system is more efficient ?
One which has taken less time should have more throughput but here the lesser time taken has less throughput which makes me confused to understand the performance??
can anyone help me here???
As per JMeter Glossary:
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time)
So double check total test duration for both test executions, my expectation is that the "new system test" took longer.
With regards to the reason I cannot state anything meaningful without seeing the full .jtl results files for both executions, I can only assume that it could be one very long request in the "new system" test or you're having a Timer with random think time somewhere in your test.

java.lang.OutOfMemoryError: Direct buffer memory during Druid ingestion task

I'm ingesting data into Druid using Kafka ingestion task.
The test data is 1 message/second. Each message has 100 numeric columns and 100 string columns. Number values are random. String values are taken from a pool of 10k random 20 char strings. I have sum, min and max aggregations for each numeric column.
Config is the following:
Segment granularity: 15 mins.
Intermediate persist period: 2 mins.
druid.processing.buffer.sizeBytes=26214400
druid.processing.numMergeBuffers=2
druid.processing.numThreads=1
The Druid docs say that sane max direct memory size is
(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) *
druid.processing.buffer.sizeBytes
where "The + 1 factor is a fuzzy estimate meant to account for the segment decompression buffers and dictionary merging buffers."
According to the formula I need 100 MB of direct memory but I get java.lang.OutOfMemoryError: Direct buffer memory even when I set max direct memory to 250 MB. This error is not consistent: sometimes I have this error, sometimes I don't.
My target is to calculate max direct memory before I start the task and to not get the error during task execution. My guess is that I need to calculate this "+1 factor" precisely. How can I do this?
In my experience, that formula has been pretty good, with the exception of being careful that a MB is not 1000 KB, but 1024. But I am quite surprised it still gave you the error with 250MB. How are you setting the direct memory size? And are you using a MiddleManager with Peons? Because the peons do the actual work, and you have to set the max direct memory on the peons, not the middle manager. You do it with the following parameter in the Middle Manager runtime.properties. This is what I have on mine:
druid.indexer.runner.javaOptsArray=["-server", "-Xms200m", "-Xmx200m", "-XX:MaxDirectMemorySize=220m", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager", "-XX:+ExitOnOutOfMemoryError", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/var/log/druid/task/"]
You also have to set the other properties this way too: druid.indexer.fork.property.druid.processing.buffer.sizeBytes druid.indexer.fork.property.druid.processing.numMergeBuffers
druid.indexer.fork.property.druid.processing.numThreads

I cannot increase the throughput to the number I want

I am trying to stress test my server.
To do so I am using Jmeter and here is my set up:
I use
my Setup
Thread: 1000
schedule for 3 mins
So as you see I keep going with 1000 thread for a period of 3 mins.
But when I look at the throughput I only get around 230 per second
results
So what should I do to increase the through put to for example 1000000 per second? How come increasing the thread which I assume means more load does not increase throughput?
According to JMeter Glossary
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
Throughput explicitly relies on the application response time. Looking into your results, the average response time is 3.5 seconds therefore you will not get more than 1000 / 3.5 = 285 requests per second
Theoretically you could use Throughput Shaping Timer and Concurrency Thread Group combination, this way JMeter will kick off extra threads if the current amount is not enough to reach/maintain the desired throughput, however looking into 8.5% error rate and maximum response time for your application > 2 minutes my expectation is that you will not be able to get more throughput because most probably your application is overloaded and cannot respond faster.
Throughput measures the number of transactions or requests that can be made in a given period of time. basically, it lists the number of requests server managed to serve in a given time period. Throughput value depends on lot of factors and maybe your application under test not able to cater the expected load.
So with 1000 threads, you can't expect a 1000 throughput.
It's up to you to find out how much throughput your application can handle. For that maybe you need to do different optimizations on your side like optimize your script, distribute load via JMeter execution, increase theard count,...etc

How to increase greenplum concurrency and # query per sec

We have a fairly big Greenplum v4.3 cluster. 18 hosts, each host has 3 segment nodes. Each host has approx 40 cores and 60G memory.
The table we have is 30 columns wide, which has 0.1 billion rows. The query we are testing has 3-10 secs response time when there is no concurrency pressure. As we increase the # of queries we fired in parallel, the latency is decreasing from avg 3 secs to 50ish secs as expected.
But we've found that regardless how many queries we fired in parallel, we only have like very low QPS(query per sec), almost just 3-5 queries/sec. We've set the max_memory=60G, memory_limit=800MB, and active_statments=100, hoping the CPU and memory can be highly utilized, but they are still poorly used, like 30%-40%.
I have a strong feeling, we tried to feed up the cluster in parallel badly, hoping to take the best out of the CPU and Memory utilization. But it doesn't work as we expected. Is there anything wrong with the settings? or is there anything else I am not aware of?
There might be multiple reasons for such behavior.
Firstly, every Greenplum query uses no more than one processor core on one logical segment. Say, you have 3 segments on every node with 40 physical cores. Running two parallel queries will utilize maximum 2 x 3 = 6 cores on every node, so you will need about 40 / 6 ~= 6 parallel queries to utilize all of your CPUs. So, maybe for your number of cores per node its better to create more segments (gpexpand can do this). By the way, are the tables that used in the queries compressed?
Secondly, it may be a bad query. If you will provide a plan for the query, it may help to understand. There some query types in Greenplum that may have master as a bottleneck.
Finally, that might be some bad OS or blockdev settings.
I think this document page Managing Resources might help you mamage your resources
You can use Resource Group limit/controll your resource especialy concurrency attribute(The maximum number of concurrent transactions, including active and idle transactions, that are permitted in the resource group).
Resouce queue help limits ACTIVE_STATEMENTS
Note: The ACTIVE_STATEMENTS will be the total statement current running, when you have 50s cost queries and next incoming queries, this could not be working, mybe 5 * 50 is better.
Also, you need config memory/CPU settings to enable your query can be proceed.

How is total throughput value calculated in Aggregate Report?

I discovered, that in Aggregate Report TOTAL THROUGHPUT value depends on thread count. And if we run tests with only one thread, total throughput is calculated as 1 / Total Average (and multiplied by 1000 to convert milliseconds to seconds, see the screenshot below).
But when we set thread count to 2 or more, total throughput is calculated the unknown way, so what I want to know is which formula is used when calculating total throughput in this case (thread count > 1), because it does not seem to be an average of all requests throughput, it's also not calculated as 1 / Total Average like described in the first case. So how exactly does this work? (Screenshot for 2 threads attached below)
Thanks.
Screenshot for 1 thread used:
aggregate_1_thread.png
Screenshot for 2 threads used:
aggregate_2_threads.png
As per doc:
http://jmeter.apache.org/usermanual/component_reference.html#Aggregate_Report
Throughput - the Throughput is measured in requests per second/minute/hour. The time unit is chosen so that the displayed rate is at least 1.0. When the throughput is saved to a CSV file, it is expressed in requests/second, i.e. 30.0 requests/minute is saved as 0.5.
So result depends both on Response time and Number of threads which influences those response times.
The total number of requests is divided by the time taken to run them, see:
https://github.com/apache/jmeter/blob/trunk/src/core/org/apache/jmeter/visualizers/SamplingStatCalculator.java#L198

Resources