Grafana: Get last N minutes events count - spring

I use Spring Micrometer to count every occurrence of a specific event (using counter).
How can I get the difference between the counts between now and N minutes ago? I need to how many events were occurred in the last N minutes.
I Grafana I can find only count, m1_rate, m5_rate, m15_rate and mean_rate.

It depends on your datasource. I don't know Micrometer, but looking at the docs, it seems it publishes metrics to Prometheus, so that is your datasource. If that's correct, you could use something like count_over_time(metric[1h]). That gives you the number of samples for that metric in the specified time interval. I think "m1_rate" and the others are metrics created by Micrometer.

This is what I was looking for - change of counter value for last 10 minutes.
diffSeries(sum(path.to.metric.count),timeShift(sum(path.to.metric.count),'10min',true,false))

Related

Monitoring a frequently changing value use Micrometer and Prometheus with restrictions

Background:
I hava a spring boot application,I want to monitor the max and avg request’s number per minute.
Since the server assigns a thread to a request, I can observe the thread number.I use micrometer to expose the metric, and use a prometheus to pull the metrics.I chose the Gauge type to track the thread number.
AtomicInteger concurrentNumber = meterRegistry.gauge(“concurrent_thread_number”, new AtomicInteger(0));
But in prometheus, the gauge value remains 0 while I make lots of requests to the spring boot application.
I have found the reason.
My application deals with the request pretty fast, it may finish processing many requests in 1 second. I set the prometheus scrape_interval to 30s(because I don't want my machines to suffer the great load).so between the scrape intervals, the thread number changes from 0 to 1,2,3,4,..,and finally to 0. So the samples are 0 and 0.
My Question:
I don’t want to shorten the scrape_interval, is there any trick to monitor the max and avg request’s number per minute while scrape_interval is 30s? Maybe choosing another type of metric instead of gauge? Any advice would be appreciated.

prometheus query for continuous uptime

I'm a prometheus newbie and have been trying to figure out the right query to get the last continuous uptime for my service.
For example, if the present time is 0:01:20 my service was up at 0:00:00, went down at 0:01:01 and went up again at 0:01:10, I'd like to see the uptime of "10 seconds".
I'm mainly looking at the "up{}" metric and possibly combine it with the functions (changes(), rate(), etc.) but no luck so far. I don't see any other prometheus metric similar to "up" either.
The problem is that you need something which tells when your service was actually up vs. whether the node was up :)
We use the following (I hope one will help or the general idea of each):
1. When we look at a host we use node_time{...} - node_boot_time{...}
2. When we look at a specific process / container (docker via cadvisor in our case) we use node_time{...} - on(instance) group_right container_start_time_seconds{name=~"..."}) by(name,instance)
The following PromQL query must be used for calculating the application uptime in seconds:
time() - process_start_time_seconds
This query works for all the applications written in Go, which use either github.com/prometheus/client_golang or github.com/VictoriaMetrics/metrics client libraries, which expose the process_start_time_seconds metric by default. This metric contains unix timestamp for the application start time.
Kubernetes exposes the container_start_time_seconds metric for each started container by default. So the following query can be used for tracking uptimes for containers in Kubernetes:
time() - container_start_time_seconds{container!~"POD|"}
The container!~"POD|" filter is needed in order to filter aux time series:
Time series with container="POD" label reflect e.g. pause containers - see this answer for details.
Time series without container label correspond to e.g. cgroups hierarchy. See this answer for details.
If you need to calculate the overall per-target uptime over the given time range, then it is possible to estimate it with up metric. Prometheus automatically generates up metric per each scrape target. It sets it to 1 per each successful scrape and sets it to 0 otherwise. See these docs for details. So the following query can be used for estimating the total uptime in seconds per each scrape target during the last 24 hours:
avg_over_time(up[24h]) * (24*3600)
See avg_over_time docs for details.

Aggregating Counts per min using graphite functions with codahale counter data

Our ecosystem right now is graphite/grafana and we use the codahale metrics java library.
I define a counter
requestCounter = registry.counter(MetricNamespaces.REQUEST_COUNT);
and increment on every request hit to our app
requestCounter.inc();
What we observerd with codahale is that, the counter s a cumalative value... When we look at the raw data in grafana, it is an increasing value over a period of time
What functions do I use in graphite so that I can get request count per min
I tried this
alias(summarize(perSecond(sumSeries(app.request.count.*)),
'1m', 'sum', false), 'Request Count')
and also this
hitcount(perSecond(app.request.count.*), '1m')
It doesn't seem right, Can someone please advice what is the recommended way and also if we can have codahale send just the raw data when incremented instead of a cumalative count
You should use nonNegativeDerivative function of the graphite API if you want to see the rate of a counter:
nonNegativeDerivative(sumSeries('app.request.count.*))
You need to notice that you also need to configure your graphite retention policy for your metrics. Otherwise, if the resolution of your metrics does not fit the way it's sent from codahale, you'll get weird unscaled results.
For example - in our company the codahale is configured to send data every two seconds. The graphite retention policy is 1 second for the first 6 hours and then 10 seconds. If we try to look at results beyond 6 hours, they're scaled incorrectly. I actually got to this question when trying to solve this issue. I'll update here when I have an answer.

Apache Storm 0.10.0 : Could not get my custom metrics every timeBucketSizeInSecs

I register my custom metrics in my bolt, code like this, context.registerMetric("et", _executedTuple, 2), this code just count the number of tuples the bolt emitted, and I register metricconsumer in my topology.
But I just get the executedTuple every ten seconds, I just think the metric should be sent every 2 seconds(timeBucketSizeInSecs).
Perhaps you know how to solve the problem!

How get max count of request in time in splunk

Hi I'm developing rails web application with Solr search engine inside. The path to get search results is '/search/results'.
Users makes many requests when searching for something and I am in need of getting max count of intime search requests for all time (to check need it to do some optimization or increase RAM etc.). I know that there are peak times, when loading is critical and search works slowly.
I use Splunk service to collect app logs and it's possible to get this requests count from logs, but I don't know how write correct Splunk query to get data which I need.
So, how can I get max number of per 1hour requests to '/search/results' path for date range?
Thanks kindly!
If you can post your example data & or your sample search, its much easier to figure out. I'll just post a few examples of I think might lead you in the right direction.
Let's say the '/search/results' is in a field called "uri_path".
earliest=-2w latest=-1w sourcetype=app_logs uri_path='/search/results'
| stats count(uri_path) by date_hour
would give you a count (sum) per hour over last week, per hour.
earliest=-2w latest=-1w sourcetype=app_logs uri_path=*
| stats count by uri_path, hour
would split the table (you can think 'group by') by the different uri_paths.
You can use the time-range picker on the right side of the search bar to use a GUI to select your time if you don't want to use the time range abbreviations, (w=week, mon=month, m=minute, and so on).
After that, all you need to do is | pipe to the stats command where you can count by date_hour (which is an automatically generated field).
NOTE:
If you don't have the uri_path field already extracted, you can do it really easily with the rex command.
... | rex "matching stuff before uri path (?<uri_path>\/\w+\/\w+) stuff after'
| uri_path='/search/results'
| stats count(uri_path) by date_hour
In case you want to learn more:
Stats Functions (in Splunk)
Field Extractor - for permanent extractions

Resources