Signalfx- Count number of request taking time > a threshold value - signalfx

I have a metric named http.server.request that stores the duration it took to respond to a http request. I want to get the number of request for which this duration is above a threshold ( say 1000ms ). I could see various percentiles (p99, p90, mean, etc) in the options but nothing that could give me this number. Is it possible to get this?

Related

What does the term `requests` in `requests_per_second` actually mean?

In Delete By Query API doc, Throttling delete requests section
To control the rate at which delete by query issues batches of delete operations, you can set requests_per_second to any positive decimal number. This pads each batch with a wait time to throttle the rate. Set requests_per_second to -1 to disable throttling.
Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the requests_per_second and the time spent writing. By default the batch size is 1000, so if requests_per_second is set to 500:
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Q1: From the above example, it seems that the wording requests within requests_per_second indicates the documents in ES index, not the internal scroll/batch requests, that is this parameter indeed control how many docs to handle per second, right? If so, I think docs_per_second might be a better name.
Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".
Q2: I do not understand well why the batch is issued as a single _bulk request?
Q1: No actually it scroll_size which determine batch size for scrolling results used by the _delete_by_query method
Q2: The batch size here is the size of DELETE queries which was generated by the batch_size setup in the _search generate by the _delete_by_query / scroll_size sent to _bulk function to delete documents, so every batch selected are sent to delete in 1 _bulk

Rate limiting WebCient requests after certain number of retries?

Is it possible to rate limit webclient requests for a url or any requests after a defined number of retries using resilience4j or otherwise?
Following is a sample code.
webClient.post()
.bodyValue(...)
.header(...)
.uri(...)
.retrieve()
.bodyToMono(Void.class)
.retryWhen(...) // every 15 mins, for 1 day
.subscribe();
Example use is say 10000 requests in a day need to be sent to 100 different urls. There are retries for case when a url is temporary unavailable.
But if a url comes back up after few hours, it would get accumulated large number of requests which I would like to rate limit.
Effectively I don't want to rate limit the whole operation but for specific urls which are not available for a long time or rate limit requests which have been retried 'x' number of times. Is there a way to achieve this?
Not to be confused with circuit breaker requirement, it's not exactly an issue to keep retrying every few mins atleast in the context.

"jp#gc - Response Times Over Time" Listener in JMeter is not showing max response time which is shown in summary report listeners it shows the average

I want the response time graph to show the maximum response time taken but the graph is not showing the correct response times. As it is showing max Avg time as 8395 ms while I want it to show details of max response time details where max time was 23510 ms.
Response time graph listener actually shows the details of each sampler the average response time in milliseconds. But can I change it to show max response time? Please guide.
Found solution to get the response time graph on basis of Max response time.
There is a graph in JMeter HTML report which gets generated by default when you generate the HTML report. Response time percentiles over time and there i disabled all others except max response time.

Aggregating Counts per min using graphite functions with codahale counter data

Our ecosystem right now is graphite/grafana and we use the codahale metrics java library.
I define a counter
requestCounter = registry.counter(MetricNamespaces.REQUEST_COUNT);
and increment on every request hit to our app
requestCounter.inc();
What we observerd with codahale is that, the counter s a cumalative value... When we look at the raw data in grafana, it is an increasing value over a period of time
What functions do I use in graphite so that I can get request count per min
I tried this
alias(summarize(perSecond(sumSeries(app.request.count.*)),
'1m', 'sum', false), 'Request Count')
and also this
hitcount(perSecond(app.request.count.*), '1m')
It doesn't seem right, Can someone please advice what is the recommended way and also if we can have codahale send just the raw data when incremented instead of a cumalative count
You should use nonNegativeDerivative function of the graphite API if you want to see the rate of a counter:
nonNegativeDerivative(sumSeries('app.request.count.*))
You need to notice that you also need to configure your graphite retention policy for your metrics. Otherwise, if the resolution of your metrics does not fit the way it's sent from codahale, you'll get weird unscaled results.
For example - in our company the codahale is configured to send data every two seconds. The graphite retention policy is 1 second for the first 6 hours and then 10 seconds. If we try to look at results beyond 6 hours, they're scaled incorrectly. I actually got to this question when trying to solve this issue. I'll update here when I have an answer.

calculation of the total response time using jmeter

I ran a test and generated various reports using the listeners. but am confused how to calculate the total response time for the test that has run. have attached the screenshots.
enter image description here
Why do you need a total response time of all the requests of the test? Are all these requests related to 1 transaction? (!!)
1) In that case, You can place all the requests under a Transaction Controller. This will give you the total time of all the requests placed under Transaction Controller.
2) You can calculate the total response time by using Beanshell listener.
3) Write the results in CSV file and you can calculate the sum.
In your case, each and every samples made only one request. So total time would be Average * (No of Samples) = 448 ms

Resources