Webflux: CancelledServerWebExchangeException appears in metrics for seemingly no reason - spring-boot

After upgrading to spring-boot 2.5, CancelledServerWebExchangeException started to appear in prometheus http_server_requests_seconds metrics quite frequently (up to 10% server responses end up with it, according to graphics). It appears in my own API metrics, as well as actuator endpoints metrics (health, info, prometheus).
Example:
http_server_requests_seconds_count{exception="CancelledServerWebExchangeException",method="GET",outcome="UNKNOWN",status="200",uri="/actuator/health"} 137.0
Kind of strange combination of outcome="UNKNOWN" & status="200"
The problem is: all these requests have successful responses.
Questions: what is this exception for and why may it occur so often?
How to reproduce: start application locally and put some load on it (I used 50 threads in jmeter accessing actuator endpoints)

Related

Usage of micrometer-registry-prometheus slow down my Spring Boot application

I have Spring Boot application 2.5.7 where I set up a micrometer to scrape metrics
runtimeOnly("io.micrometer:micrometer-registry-prometheus")
When I make a request locally http://localhost:8081/actuator/prometheus
There are no performance problems with my application
But when I make a request to the actuator on the server with a high load
https://myserver:8081/actuator/prometheus
it returns a lot more data in response and it also slows down all request that is currently running on my server.
The problem appears even after one request to /actuator/prometheus
Is there any way to optimize the micrometer work(while returning the same ammount of metrics), so it will not slow down my application?
Without sufficient data it is hard to give a recommendation. If the slowness is due to insufficient memory/garbage collection, try increasing the memory of your application.
Reviewing the metrics being returned may also give you some ideas, for example if you have a high thread count, I think there is a pause when Micrometer iterates over the thread statuses. You could look into disabling that metric.

CAS Actuator Health Endpoints Return 403 Intermittently

I recently upgraded CAS to 6.4.6.x and noticed that the liveness/readiness probes will intermittently throw 403 error codes. It appears to be a threading issue in the Spring Security Filter Chain. I have validated with the barebone CAS images that this does not happen in the 6.3.x version but can repeat it rather easily with the 6.4.x version. My configuration has not changed after the upgrade and I'm following the documentation.
Endpoint Configuration:
# allow all by default
cas.monitor.endpoints.endpoint.defaults.access[0]=PERMIT
# enable the health endpoint
management.endpoints.enabled-by-default=true
management.endpoints.web.base-path=/actuator
management.endpoints.web.exposure.include=health
management.endpoint.health.enabled=true
Running load tests against the instance if I send 1 request at a time I get 200 responses. If I bump up the concurrency to 2 or more I'm able to reproduce the threading issue and some of the responses return with a 403 after getting picked up by the Spring Default Error Controller.
Setting a breakpoint on the Error Controller, I'm able to see the same thread in the logs essentially jump to two different points in the code path.
I've gone through the Pull Requests from 6.3.x to 6.4.x and nothing jumped out to me that might be causing this issue. I haven't seen any issues raised up in Spring Boot around the Actuator Health Points failing. I've bumped up Spring and Tomcat to the latest patch versions. Any thoughts on what could be causing this or other things I could try to determine how to fix it?

Scale SpringBoot App based on Thread Pool State

We have a Spring Boot microservice which should get some data from old / legacy system. This microservice exposes external modern REST API. Sometimes we have to issue 7-10 requests to the legacy system in order to get all the data we need for single API call. Unfortunately we can't use Reactor / WebClient and have to stick with WebServiceTemplate to issue those "legacy" calls. We can't also use Reactive Spring WebClient - Making a SOAP call
What is the best way to scale such a miroservice in Kubernetes? We have very big concerns that Thread Pool used for parallel WebServiceTemplate invocation will be depleted very fast, but I'm not sure that creating and exposing custom metric based on active threads count / thread pool size is a good idea.
Any advice will be helpful.
Enable Prometheus exporter in Spring
Make sure metrics are scraped. You're going to watch for a threadpool_size metric. Refer your k8s/prometheus distro docs to get prometheus service discovery working for you.
Write a horizontal pod autoscaler (HPA) based on a Prometheus metric:
Setup Prometheus-Adapter and follow the HPA walkthrough.
Or follow this guide https://github.com/stefanprodan/k8s-prom-hpa
Depending on what k8s distro you are using, you might have different ways to get the Prometheus and prometheus discovery:
(example platform built-in) https://cloud.google.com/stackdriver/docs/solutions/gke/prometheus
(example product) https://docs.datadoghq.com/integrations/prometheus/
(example opensource) https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
any other prometheus solution

Spring boot service higher response times under heavy load

the response time of my spring boot rest service running on embedded tomcat sometimes goes really high. I have isolated the external dependencies and all of that is pretty quick.
I am at a point that I think that it is something to do with tomcat's default 200 thread pool size that it reserves only for incoming requests for the service.
What I believe is that all 200 threads under heavy load (100 requests per second) are held up and other requests are queued and lead to higher response time.
I was wondering if there is a definitive way to find out if the incoming requests are really getting queued? I have done an extensive research on tomcat documentation, spring boot embedded container documentation. Unfortunately I don't see anything relevant.
Does anyone have any ideas on how to check this

Limit number of parallel requests to spring boot actuator health

We are using spring boot actuator to get health status of an application, my understanding is that request for health check will be handled by thread out of thread pool that is used to serve actual service requests.
Is there a way to limit number of requests for health endpoint to prevent a DDOS type starvation.
You can use Spring Boot Throttling community library. I think you could restrict DDOS access to your endpoints (Actuator or otherwise) using it's configuration.
https://github.com/weddini/spring-boot-throttling
Another possibility to reduce DDOS vulnerability on the /health endpoint is to have your health checks run on a separate thread pool.
This ensures that:
no more than one health indicator concurrently runs at any given time against an underlying service
your /health endpoint returns instantly (as it returns healths pre-calculated on different threads).
For this purpose, and if you are using Spring Boot >= 2.2, you can use the separate library spring-boot-async-health-indicator to run your healthchecks on a separate thread pool by simply annotating them with #AsyncHealth.
Disclaimer: I created this library to address this issue (among others)

Resources