Spring Boot Actuator /health endpoint slow response? - spring-boot

First time I'm trying out the actuator dependency, I configured it to be opt-in
management.endpoints.enabled-by-default=false
management.endpoint.info.enabled=true
management.endpoint.health.enabled=true
When I call /actuator/health locally it takes around 1.4 seconds to respond. Keep in mind this is a local call, from the same machine of the server.
If I create a regular endpoint that replies with an empty response, the request would take just a couple of milliseconds.
Is this normal? Can I make it reply faster?

Original Answer: https://stackoverflow.com/a/63666118/1861769
Basically health endpoint is implemented in a way that it contains a
list of all Spring beans that implement the interface HealthIndicator.
Each health indicator is responsible for supplying a health
information about one subsystem (examples of such subsystem are:disk,
postgres, mongo, etc.), spring boot comes with some predefined
HealthIndicators.
So that when the health endpoint is invoked, it iterates through this
list and gets the information about each subsystem and then constructs
the answer.
Hence you can place a break point in relevant health indicators
(assuming you know which subsystems are checked) and see what happens.
If you're looking for the HTTP entry point - the code that gets called
when you call http:///actuator/health (can vary depending
on your settings but you get the idea)`, it can be found here
Yet another approach that comes to mind is disabling "suspicious"
health check and finding the slow one by elimination.
For example, if you have an elastricsearch and would like to disable
it, use in the application.properties:
management.health.elasticsearch.enabled = false

Related

Is OpenTracing enabled for Reactive Routes in Quarkus?

I have recently changed my Quarkus application from RestEasy to Reactive Routes to implement my HTTP endpoints.
My Quarkus app had OpenTracing enabled and it was working fine. After changing the HTTP resource layer I can not see any trace in Jaeger.
After setting log level in DEBUG I can see my application is registered in Jaeger but I don't see any traceId or spanId in logs neither traces in Jaeger:
15:44:36 DEBUG traceId=, spanId=, sampled= [io.qu.ja.ru.JaegerDeploymentRecorder] (main) Registering tracer to GlobalTracer JaegerTracer(version=Java-0.34.3, serviceName=employee, reporter=RemoteReporter(sender=HttpSender(), closeEnqueueTimeout=1000), sampler=ConstSampler(decision=true, tags={sampler.type=const, sampler.param=true}), tags={hostname=employee-8569585469-tg8wg, jaeger.version=Java-0.34.3, ip=10.244.0.21}, zipkinSharedRpcSpan=false, expandExceptionLogs=false, useTraceId128Bit=false)
15:45:03 INFO traceId=, spanId=, sampled= [or.se.po.re.EmployeeResource] (vert.x-eventloop-thread-0) getEmployees
I'm using the latest version of Quarkus which is 1.9.2.Final.
Is it enabled OpenTracing when I'm using Reactive Routes?
Tracing is enabled by default for JAX-RS endpoints only, not for reactive routes at the moment. You can activate tracing by annotating your route with #org.eclipse.microprofile.opentracing.Traced.
Yes, adding #Traced enable to activate tracing on reactive routes.
Unfortunately, using both JAX-RS reactive and reactive routes bugs the tracing on event-loop threads used by JAX-RS reactive endpoint when they get executed.
I only started Quarkus 2 days ago so i don't really the reason of this behavior (and whether it's normal or it's a bug), but obviously switching between two completely mess up the tracing.
Here is an example to easily reproduce it:
Create a REST Easy reactive endpoint returning an empty Multi
Create a custom reactive route
set up the IO threads to 2 (easier to quickly reproduce it)
Run the application, and request the two endpoints alternatively
Here is a screenshot that show the issue
As you can see, as soon as the JAX-RS resource is it and executed on one of the two threads available, it "corrupts" it, messing the trace_id reported (i don't know if it's the generation or the reporting on logs that is broken) on logs for the next calls of the reactive route.
This does not happen on the JAX-RS resource, as you can notice on the screenshot as well. So it seems to be related to reactive routes only.
Another point here is the fact that JAX-RS Reactive resources are incorrectly reported on Jaeger. (with a mention to a missing root span) Not sure if it's related to the issue but that's also another annoying point.
I'm thinking to completely remove the JAX-RS Reactive endpoint and replace them by normal reactive route to eliminate this bug.
I would appreciate if someone with more experience than me could verify this or tell me what i did wrong :)
EDIT 1: I added a route filter with priority 500 to clear the MDC and the bug is still there, so definitely not coming from MDC.
EDIT 2: I opened a bug report on Quarkus
EDIT 3: It seems related to how both implementations works (thread locals versus context propagation in actor based context)
So, unless JAX-RS reactive resources are marked #Blocking (and get executed in a separated thread pool), JAX-RS reactive and Vertx reactive routes are incompatible when it comes to tracing (but also probably the same for MDC related informations since MDC is also thread related)

Make spring-boot 2.2.0 report status = UP, even when the DB is down?

Up to spring-boot 2.1.9, I used to set management.health.defaults.enabled = false to decouple the /health endpoint overall status from the database status.
As of 2.2.0, that specific setting no longer works that way (see: SpringBoot 2.1.9 -> 2.2.0 - health endpoint no longer works).
Is there a way to configure spring-boot to decouple the overall status of the /health endpoint from whether or not the datasource is up?
I'm inclined to just make my own endpoint hardcoded to return a status of 200.
I don't really understand what you're trying to do and how disabling all defaults achieved what you've described.
What would be the point of having an endpoint that returns 200 unconditionally? That's seriously misleading IMO.
If you do not want the datasource health indicator, then you can disable that (and only that) using management.health.db.enabled=false.
If you want the datasource health check but want to be able to ignore it, create a group that exclude the db health check and use that for monitoring. See the documentation for more details

Active Standby in MicroService using Spring Boot

I am in a situation where I have a microservice environment but for one service I want that it should not be load balanced rather work in a active standby(one at a time serves the request). When one service instance goes down then only the requests should be routed to the other instance even if the first one comes up the 2nd instance should be the one servicing the requests.
I am looking for the options like - overriding ribbon's IRule or doing this in a #PreFilter which is there on each of these services.
Let me know if anyone has any implementation for the above case.

Spring cloud - how to get benefits of retry,load balancing and circuit breaker for distributed spring application

I want the following features in spring-cloud-Eureka backed microservices application.
1) Load balancing - if I have 3 nodes for one service, load balancing should happen between them
2)Retry logic - if one of the nodes did not respond, retry should happen for certain number ( eg 3. should be configurable) before falling back to another node.
3)circuit breaker - if for some reasons, all the 3 nodes of service is having some issue accessing db and throwing exceptions or not responding, the circuit should get open, fall back method called and circuit automatically closes after the services recovers.
Looking at many examples of Spring-cloud, I figured out
1) RestTemplate will help with option 1. but when RestTemplate access one instance of service and if the node fails, will it try with other two nodes?
2) Hystix will help with circuit breaker option (3 above). but if just one node is not responding, will it try other nodes, before opening up circuit and call fallback method. and will it automatically close circuit once the service recovers?
3) how to get retryLogic with spring-cloud? I do know about #Retryable annotation. But will it help in the following situation?
Retry with one node for 3 times and after it fails, try the next node 3 times and the last node 3 times before circuit breaker kicks in.
I see that all these configurations are available in spring cloud. but having a hard-time understanding how to configure for all these for efficient solution.
Here is one proposed:
#HystrixCommand
#Retryable
public Object doSomething() {
// use your RestTemplate here
}
But I don't totally know if it is going to help me with all the subtleties I mentioned above.
I do see there is a #FeignClient. But from this blog, I understand that it provides a high level feature for HTTP client requests. Does it help with retry and circuit breaker and load balancing all-in-one?
Thanks
I do see there is a #FeignClient. Does it help with retry and circuit breaker and load balancing all-in-one?
If you are using the full spring-cloud stack, it actually solves everything you mentioned.
The netflix components in this scenario are the following in spring-cloud:
Eureka - Service Registry
Let's you dyanmically register your services so you only need to fix one host in your app (eureka).
Ribbon - Load balancer
Out of the box it's providing you with round robin loadbalancing, but you can implement your own #RibbonClient (even for a specific service) and design your custom loadbalancing for example based on eureka metadata. The loadbalancing happens on the client side.
Feign - Http client
With #FeignClient you can rapidly develop clients for you other services (or services outside of your infrastructure). It is integrated with ribbon and eureka so you can refer to your services #FeignClient(yourServiceNameInEureka) and what you end up with is a client which loadbalances between the registered instances with your preferred logic. If you are using spring you can use the familiar #RequestMapping annotation to describe the endpoint you are using.
Hystrix - Circuit breaker
By default your feign clients will use hystrix, every request will be wrapped in a hystrix command. You can of course create hytrix commands by hand and configure them for your needs.
You have to configure a little to get thees working (actually just a few #Enable annotation on your configuration).
I highly recommend reading the provided spring documentation because it wraps up almost all of your aspects in a fairly quick read.
http://cloud.spring.io/spring-cloud-netflix/spring-cloud-netflix.html

Distinguish between expensive and inexpensive health checks

We typically ping /health very frequently in our highly available applications to determine when failover needs to happen. Spring Boot Actuator works well for this if the health indicators that are used don't make expensive calls to external dependencies like a database or web service. However, we like the ease of writing health indicators and how it plugs into the /health endpoint.
Is there any way to configure the Spring Boot Actuator such that only a subset of the indicators are executed in certain circumstances? If so, how?
Thanks!
You can control which health indicators are enabled using the management.health.<service>.enabled properties. For example, to switch off the database health check:
management.health.db.enabled: false
The full list of properties is available here. At the time of writing they are:
management.health.db.enabled
management.health.diskspace.enabled
management.health.mongo.enabled
management.health.rabbit.enabled
management.health.redis.enabled
management.health.solr.enabled
I'm in a similar situation right now. I just implemented a custom Endpoint which does the expensive health checks.
If you need the comfort of an HTTP endpoint you can also implement an AbstractEndpointMvcAdapter which does a similar HTTP status code mapping as Spring's HealthMvcEndpoint.
2021 Update
Health Groups provide the exact functionality you're asking about.
https://docs.spring.io/spring-boot/docs/2.5.3/reference/html/actuator.html#actuator.endpoints.health.groups
If you are using Spring Boot >= 2.2, you can use the separate library spring-boot-async-health-indicator to make your expensive healthchecks run on a separate thread by simply annotating them with #AsyncHealth.
This will ensure that your /health endpoint always return very fast and does not wait on those expensive healthchecks to complete.
Example:
#AsyncHealth
#Component
public class MyExpensiveHealthCheck implements HealthIndicator {
#Override
public Health health() {
verySlowCheck(); // This method does not run when /health is called
return Health.up().build();
}
}
Disclaimer: I created this library for this exact purpose

Resources