Monitoring a frequently changing value use Micrometer and Prometheus with restrictions - spring-boot

Background:
I hava a spring boot application,I want to monitor the max and avg request’s number per minute.
Since the server assigns a thread to a request, I can observe the thread number.I use micrometer to expose the metric, and use a prometheus to pull the metrics.I chose the Gauge type to track the thread number.
AtomicInteger concurrentNumber = meterRegistry.gauge(“concurrent_thread_number”, new AtomicInteger(0));
But in prometheus, the gauge value remains 0 while I make lots of requests to the spring boot application.
I have found the reason.
My application deals with the request pretty fast, it may finish processing many requests in 1 second. I set the prometheus scrape_interval to 30s(because I don't want my machines to suffer the great load).so between the scrape intervals, the thread number changes from 0 to 1,2,3,4,..,and finally to 0. So the samples are 0 and 0.
My Question:
I don’t want to shorten the scrape_interval, is there any trick to monitor the max and avg request’s number per minute while scrape_interval is 30s? Maybe choosing another type of metric instead of gauge? Any advice would be appreciated.

Related

What happens if I give a large value to server.tomcat.max-threads to handle load on my application?

There are around 1000+ jobs running through our service in a day and around 70-80 jobs starting at the same time and running parallelly.
To handle this, we looked that increasing the number of max threads to a large number to server.tomcat.max-threads property of our Spring application should work but I do not have full confidence as to what all can be the side effects of having a huge number like 800 to this property.
Can you please help here.
The default installation of Tomcat sets the maximum number of HTTP servicing threads at 200. Effectively, this means that the system can handle a maximum of 200 simultaneous HTTP requests. When the number of simultaneous HTTP requests exceeds this count, the unhandled requests are placed in a queue, and the requests in this queue are serviced as processing threads become available. This default queue length is 100. At these default settings, a large web load that can generate over 300 simultaneous requests will surpass the thread availability, resulting in service unavailable (HTTP 503).
More reference: https://docs.bmc.com/docs/brid91/en/tomcat-container-workload-configuration-825210082.html
How to run multiple servlets execution in parallel for Tomcat?
If this is a batch job like configuration, you can use spring batch.

Spring boot thread pool executor rest template behavior in case queueCapacity is 0 is decreasing performance for a rest apis application

I am stuck with a strange problem and not able to find out its root cause. This is my rest template thread pool executor :
connectionRequestTimeout: 60000
connectTimeout: 60000
socketTimeout: 60000
responseTimeout: 60000
connectionpoolmax: 900
defaultMaxPerRoute: 20
corePoolSize: 10
maxPoolSize: 300
queueCapacity: 0
keepAliveSeconds: 1
allowCoreThreadTimeOut: true
1) I know as the queueCapacity is 0 thread pool executor is going to create SynchronusQueue. The first issue is if I give its value positive integer value such as 50, application performance is decreasing. As per my understanding, we should only be using SynchronouseQueue in rare cases not in a spring boot rest API based application like mine.
2) Second thing is, I want to understand how SynchronousQueue works in a spring boot rest API application deployed on a server (tomcat). I know A SynchronousQueue has zero capacity so a producer blocks until a consumer is available, or a thread is created. But who consumer and producer in this case as all the requests are served by a web or application server. How does SynchronousQueue will basically work in this case?
I am checking the performance by running JMeter script on my machine. This script can handle more cases with queueCapacity 0 rather than some > 0.
I really appreciate any insight.
1) Don't set the queueCapacity explicitly otherwise, it is bound to degrade performance. Since we're limiting the incoming requests that can reside in the queue and it will be taken up once one of the thread becomes available from the fixed threadpool.
ThreadPoolTaskExecutor has a default configuration of the core pool
size of 1, with unlimited max pool size and unlimited queue capacity.
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/scheduling/concurrent/ThreadPoolTaskExecutor.html
2) In a SynchronousQueue, pairs of insert and remove operations always occur simultaneously, so the queue never actually contains anything. It passes data synchronously to other thread, it waits for the other party to take the data instead of just putting data and returning.
Read more:
https://javarevisited.blogspot.com/2014/06/synchronousqueue-example-in-java.html#ixzz6PFz4Akom
https://www.baeldung.com/thread-pool-java-and-guava
I hope my answer can help you in one way.

What can cause a Cloud Run instance to not be reused despite continuous load?

Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.

Validate newly created server support the same load

We are creating a new hosted server for one of our APIs on managed containers (Kubernetes) and we're trying to validate that it can handle at least the same amount of traffic load requests.
We've started with one of the APIs, where we would need to handle at least 140k requests per minute, all endpoints combined.
To verify this, I created a simple JMeter test as follows:
-Test Plan
---Thread Group Endpoint1
-----HTTP Request -> a GET request with query params for /path1
---Thread Group Endpoint2
-----HTTP Request -> a GET request with query params for /path2
For a local test, I used the following setup:
Thread Groups Endpoint1 and Endpoint2 are set to 200 threads (users), ramp-up period of 1s, loop count = forever and duration 60s.
Using a Summary Report listener when running the test gets me a total of ~9300 # Samples.
Using this approach, is it safe to just increase the number of threads (users) for the Thread Groups until I reach the desired 140k requests per minute?
Note: I only used JMeter a little before, so I'm aware that the entire approach may be wrong, therefore any suggestions and steering to the right path are more than welcomed.
Your approach is viable as long as it represents real-life application usage. If it has 2 endpoints with equally/evenly distributed load - your setup is just fine. If there are more endpoints and some of them are used more than the others - consider defining the workload correspondingly either using different Thread Groups or other distribution mechanism such as Throughput Controller
Increasing the number of threads is also fine, however consider increasing the load gradually, to wit increase ramp-up time so your test could have:
Arrivals phase
Time to hold the load
Ramp-down phase
This way you will be able to correlate various metrics like increasing response time, throughput, number of errors, etc. with the increasing load. Also you will be able to state what was the number of threads/requests per second when the system reached saturation point/breaking point and does it recover when the load gets back.
Also make sure you're following JMeter Best Practices as 2300/2500 requests per second is not something JMeter can support out of the box and you will need to do some tuning, at least increase JVM Heap size allocated to JMeter.
You may not be able to achieve the desired 140k requests per minute using a single Jmeter Machine, in that case you'll need Distributed Load Testing approach here.
refer: http://jmeter.apache.org/usermanual/jmeter_distributed_testing_step_by_step.html
Also keeping the ramp-up period of 1 second will lead to spike and unrealistic load in the system which will not give proper result unless you've pre-warmed your server, you should gradually increase the load as per real/estimated traffic pattern.

Aggregating Counts per min using graphite functions with codahale counter data

Our ecosystem right now is graphite/grafana and we use the codahale metrics java library.
I define a counter
requestCounter = registry.counter(MetricNamespaces.REQUEST_COUNT);
and increment on every request hit to our app
requestCounter.inc();
What we observerd with codahale is that, the counter s a cumalative value... When we look at the raw data in grafana, it is an increasing value over a period of time
What functions do I use in graphite so that I can get request count per min
I tried this
alias(summarize(perSecond(sumSeries(app.request.count.*)),
'1m', 'sum', false), 'Request Count')
and also this
hitcount(perSecond(app.request.count.*), '1m')
It doesn't seem right, Can someone please advice what is the recommended way and also if we can have codahale send just the raw data when incremented instead of a cumalative count
You should use nonNegativeDerivative function of the graphite API if you want to see the rate of a counter:
nonNegativeDerivative(sumSeries('app.request.count.*))
You need to notice that you also need to configure your graphite retention policy for your metrics. Otherwise, if the resolution of your metrics does not fit the way it's sent from codahale, you'll get weird unscaled results.
For example - in our company the codahale is configured to send data every two seconds. The graphite retention policy is 1 second for the first 6 hours and then 10 seconds. If we try to look at results beyond 6 hours, they're scaled incorrectly. I actually got to this question when trying to solve this issue. I'll update here when I have an answer.

Resources