Aggregating Counts per min using graphite functions with codahale counter data - metrics

Our ecosystem right now is graphite/grafana and we use the codahale metrics java library.
I define a counter
requestCounter = registry.counter(MetricNamespaces.REQUEST_COUNT);
and increment on every request hit to our app
requestCounter.inc();
What we observerd with codahale is that, the counter s a cumalative value... When we look at the raw data in grafana, it is an increasing value over a period of time
What functions do I use in graphite so that I can get request count per min
I tried this
alias(summarize(perSecond(sumSeries(app.request.count.*)),
'1m', 'sum', false), 'Request Count')
and also this
hitcount(perSecond(app.request.count.*), '1m')
It doesn't seem right, Can someone please advice what is the recommended way and also if we can have codahale send just the raw data when incremented instead of a cumalative count

You should use nonNegativeDerivative function of the graphite API if you want to see the rate of a counter:
nonNegativeDerivative(sumSeries('app.request.count.*))
You need to notice that you also need to configure your graphite retention policy for your metrics. Otherwise, if the resolution of your metrics does not fit the way it's sent from codahale, you'll get weird unscaled results.
For example - in our company the codahale is configured to send data every two seconds. The graphite retention policy is 1 second for the first 6 hours and then 10 seconds. If we try to look at results beyond 6 hours, they're scaled incorrectly. I actually got to this question when trying to solve this issue. I'll update here when I have an answer.

Related

Is there any time shift between jmeter and influxdb?

Just starting with jmeter and making some experiments I found something that looks kind of odd to me. I connected jmeter with influxdb and measured the avg. time response of one single request in a infinite loop. When I stopped the test I realized that the last time in the results csv created by jmeter is not the same as the one taken by influxdb. Specifically jmeter last measure is 13s higher than the one registered by influxdb. Any ideas on what could be happening?
I've tried to google it but haven't found any documentation or problem related
JMeter sends aggregated metrics, to wit it doesn't send each and every SampleResult but collects the results within some "window", default value is 5 seconds, controllable via backend_influxdb.send_interval JMeter Property
And metrics which are being sent are described here
You can try decreasing the 5 seconds window by amending the aforementioned backend_influxdb.send_interval JMeter property and setting it i.e. to 1000 ms so JMeter would send the data more often but it will create extra overhead so make sure that JMeter has enough headroom to operate and increasing metrics sending rate doesn't affect the overall throughput.

Monitoring a frequently changing value use Micrometer and Prometheus with restrictions

Background:
I hava a spring boot application,I want to monitor the max and avg request’s number per minute.
Since the server assigns a thread to a request, I can observe the thread number.I use micrometer to expose the metric, and use a prometheus to pull the metrics.I chose the Gauge type to track the thread number.
AtomicInteger concurrentNumber = meterRegistry.gauge(“concurrent_thread_number”, new AtomicInteger(0));
But in prometheus, the gauge value remains 0 while I make lots of requests to the spring boot application.
I have found the reason.
My application deals with the request pretty fast, it may finish processing many requests in 1 second. I set the prometheus scrape_interval to 30s(because I don't want my machines to suffer the great load).so between the scrape intervals, the thread number changes from 0 to 1,2,3,4,..,and finally to 0. So the samples are 0 and 0.
My Question:
I don’t want to shorten the scrape_interval, is there any trick to monitor the max and avg request’s number per minute while scrape_interval is 30s? Maybe choosing another type of metric instead of gauge? Any advice would be appreciated.

Grafana: Get last N minutes events count

I use Spring Micrometer to count every occurrence of a specific event (using counter).
How can I get the difference between the counts between now and N minutes ago? I need to how many events were occurred in the last N minutes.
I Grafana I can find only count, m1_rate, m5_rate, m15_rate and mean_rate.
It depends on your datasource. I don't know Micrometer, but looking at the docs, it seems it publishes metrics to Prometheus, so that is your datasource. If that's correct, you could use something like count_over_time(metric[1h]). That gives you the number of samples for that metric in the specified time interval. I think "m1_rate" and the others are metrics created by Micrometer.
This is what I was looking for - change of counter value for last 10 minutes.
diffSeries(sum(path.to.metric.count),timeShift(sum(path.to.metric.count),'10min',true,false))

prometheus query for continuous uptime

I'm a prometheus newbie and have been trying to figure out the right query to get the last continuous uptime for my service.
For example, if the present time is 0:01:20 my service was up at 0:00:00, went down at 0:01:01 and went up again at 0:01:10, I'd like to see the uptime of "10 seconds".
I'm mainly looking at the "up{}" metric and possibly combine it with the functions (changes(), rate(), etc.) but no luck so far. I don't see any other prometheus metric similar to "up" either.
The problem is that you need something which tells when your service was actually up vs. whether the node was up :)
We use the following (I hope one will help or the general idea of each):
1. When we look at a host we use node_time{...} - node_boot_time{...}
2. When we look at a specific process / container (docker via cadvisor in our case) we use node_time{...} - on(instance) group_right container_start_time_seconds{name=~"..."}) by(name,instance)
The following PromQL query must be used for calculating the application uptime in seconds:
time() - process_start_time_seconds
This query works for all the applications written in Go, which use either github.com/prometheus/client_golang or github.com/VictoriaMetrics/metrics client libraries, which expose the process_start_time_seconds metric by default. This metric contains unix timestamp for the application start time.
Kubernetes exposes the container_start_time_seconds metric for each started container by default. So the following query can be used for tracking uptimes for containers in Kubernetes:
time() - container_start_time_seconds{container!~"POD|"}
The container!~"POD|" filter is needed in order to filter aux time series:
Time series with container="POD" label reflect e.g. pause containers - see this answer for details.
Time series without container label correspond to e.g. cgroups hierarchy. See this answer for details.
If you need to calculate the overall per-target uptime over the given time range, then it is possible to estimate it with up metric. Prometheus automatically generates up metric per each scrape target. It sets it to 1 per each successful scrape and sets it to 0 otherwise. See these docs for details. So the following query can be used for estimating the total uptime in seconds per each scrape target during the last 24 hours:
avg_over_time(up[24h]) * (24*3600)
See avg_over_time docs for details.

JMeter throughput value in grafana

I refer to the "http://www.testautomationguru.com/jmeter-real-time-results-influxdb-grafana/" article,
through grafana + infludx but there is a jmeter tps (throughput) value I do not know how get?
I tried "jmeter.all.h.count" but it did not seem to be the value I wanted:
I wrote the blog you were referring to!
We can not expect the backend listener metrics to give you an accurate result as you expect in the aggregate report - (specially percentiles, avg etc)
BackEndListener basically gives the metrics over time. You should plot the graph using the data over time. If you try to use single stat metric of Grafana with that data , then you will see a complete mismatch.
In the blog - I was using the modified apache_core.jar library to get the results as you are actually expecting. However I stopped sharing (after jmeter 2.13, jmeter 3.0) the modified lib.
You are referring to wrong component for values.
For GraphiteBackendListenerClient, sent values are described here:
http://jmeter.apache.org/usermanual/realtime-results.html
For InfluxdbBackendListenerClient, sent values can be found here:
https://github.com/apache/jmeter/blob/trunk/src/components/org/apache/jmeter/visualizers/backend/influxdb/InfluxdbBackendListenerClient.java
Both components don't send throughput as it can be computed using Grafana and other metrics.
Found your answer in influxdb grafana tutorial,
Use jmeter.all.a.count:
If I query the “jmeter.all.a.count” which has the no of requests processed for every second by the server, I get below output. [No of requests processed / unit time = Throughput]

Resources