Distinguish between expensive and inexpensive health checks - spring-boot

We typically ping /health very frequently in our highly available applications to determine when failover needs to happen. Spring Boot Actuator works well for this if the health indicators that are used don't make expensive calls to external dependencies like a database or web service. However, we like the ease of writing health indicators and how it plugs into the /health endpoint.
Is there any way to configure the Spring Boot Actuator such that only a subset of the indicators are executed in certain circumstances? If so, how?
Thanks!

You can control which health indicators are enabled using the management.health.<service>.enabled properties. For example, to switch off the database health check:
management.health.db.enabled: false
The full list of properties is available here. At the time of writing they are:
management.health.db.enabled
management.health.diskspace.enabled
management.health.mongo.enabled
management.health.rabbit.enabled
management.health.redis.enabled
management.health.solr.enabled

I'm in a similar situation right now. I just implemented a custom Endpoint which does the expensive health checks.
If you need the comfort of an HTTP endpoint you can also implement an AbstractEndpointMvcAdapter which does a similar HTTP status code mapping as Spring's HealthMvcEndpoint.

2021 Update
Health Groups provide the exact functionality you're asking about.
https://docs.spring.io/spring-boot/docs/2.5.3/reference/html/actuator.html#actuator.endpoints.health.groups

If you are using Spring Boot >= 2.2, you can use the separate library spring-boot-async-health-indicator to make your expensive healthchecks run on a separate thread by simply annotating them with #AsyncHealth.
This will ensure that your /health endpoint always return very fast and does not wait on those expensive healthchecks to complete.
Example:
#AsyncHealth
#Component
public class MyExpensiveHealthCheck implements HealthIndicator {
#Override
public Health health() {
verySlowCheck(); // This method does not run when /health is called
return Health.up().build();
}
}
Disclaimer: I created this library for this exact purpose

Related

Spring Boot Actuator /health endpoint slow response?

First time I'm trying out the actuator dependency, I configured it to be opt-in
management.endpoints.enabled-by-default=false
management.endpoint.info.enabled=true
management.endpoint.health.enabled=true
When I call /actuator/health locally it takes around 1.4 seconds to respond. Keep in mind this is a local call, from the same machine of the server.
If I create a regular endpoint that replies with an empty response, the request would take just a couple of milliseconds.
Is this normal? Can I make it reply faster?
Original Answer: https://stackoverflow.com/a/63666118/1861769
Basically health endpoint is implemented in a way that it contains a
list of all Spring beans that implement the interface HealthIndicator.
Each health indicator is responsible for supplying a health
information about one subsystem (examples of such subsystem are:disk,
postgres, mongo, etc.), spring boot comes with some predefined
HealthIndicators.
So that when the health endpoint is invoked, it iterates through this
list and gets the information about each subsystem and then constructs
the answer.
Hence you can place a break point in relevant health indicators
(assuming you know which subsystems are checked) and see what happens.
If you're looking for the HTTP entry point - the code that gets called
when you call http:///actuator/health (can vary depending
on your settings but you get the idea)`, it can be found here
Yet another approach that comes to mind is disabling "suspicious"
health check and finding the slow one by elimination.
For example, if you have an elastricsearch and would like to disable
it, use in the application.properties:
management.health.elasticsearch.enabled = false

Make spring-boot 2.2.0 report status = UP, even when the DB is down?

Up to spring-boot 2.1.9, I used to set management.health.defaults.enabled = false to decouple the /health endpoint overall status from the database status.
As of 2.2.0, that specific setting no longer works that way (see: SpringBoot 2.1.9 -> 2.2.0 - health endpoint no longer works).
Is there a way to configure spring-boot to decouple the overall status of the /health endpoint from whether or not the datasource is up?
I'm inclined to just make my own endpoint hardcoded to return a status of 200.
I don't really understand what you're trying to do and how disabling all defaults achieved what you've described.
What would be the point of having an endpoint that returns 200 unconditionally? That's seriously misleading IMO.
If you do not want the datasource health indicator, then you can disable that (and only that) using management.health.db.enabled=false.
If you want the datasource health check but want to be able to ignore it, create a group that exclude the db health check and use that for monitoring. See the documentation for more details

Limit number of parallel requests to spring boot actuator health

We are using spring boot actuator to get health status of an application, my understanding is that request for health check will be handled by thread out of thread pool that is used to serve actual service requests.
Is there a way to limit number of requests for health endpoint to prevent a DDOS type starvation.
You can use Spring Boot Throttling community library. I think you could restrict DDOS access to your endpoints (Actuator or otherwise) using it's configuration.
https://github.com/weddini/spring-boot-throttling
Another possibility to reduce DDOS vulnerability on the /health endpoint is to have your health checks run on a separate thread pool.
This ensures that:
no more than one health indicator concurrently runs at any given time against an underlying service
your /health endpoint returns instantly (as it returns healths pre-calculated on different threads).
For this purpose, and if you are using Spring Boot >= 2.2, you can use the separate library spring-boot-async-health-indicator to run your healthchecks on a separate thread pool by simply annotating them with #AsyncHealth.
Disclaimer: I created this library to address this issue (among others)

Spring Boot health-based load balancing

I'm using Spring Boot for microservices, and I came accross and issue with load balancing.
Spring Actuator adds special health and metrics endpoint to the apps; with this, some basic information can be acquired from the running instances.
What I would like to do, is to a create a (reverse)proxy (e.g. with Zuul and/or Ribbon?), which creates a centralized load balancer, that selects instances by their health status.
For example, I have the following microservices
client
proxy (<- I would like to implement this)
server 1
server 2
When the client sends an http request to the proxy, the proxy should be able to decide, which of the to server instances has the least load, and forward request to that one.
Is there an easy way to do this?
Thanks,
krisy
If you want to make a choice on various load-data, you could implement custom HealthIndicators that accumulate some kind of 'load over time' data, use this in your load balancer to decide where to send traffic.
All custom health indicators will be picked up by spring-boot, and invoked on the actuator /health endpoint.
#Component
class LoadIndicator implements HealthIndicator {
#Override
Health health() {
def loadData = ... do stuff to gather whatever load
return Health.up()
.withDetail("load", loadData)
.build();
}
}
Perhaps you could already use some of spring-boots metrics already, there's multiple endpoints in the actuator. /beans, /trace, /metrics. Should be possible to find that data in your application too.

Spring cloud - how to get benefits of retry,load balancing and circuit breaker for distributed spring application

I want the following features in spring-cloud-Eureka backed microservices application.
1) Load balancing - if I have 3 nodes for one service, load balancing should happen between them
2)Retry logic - if one of the nodes did not respond, retry should happen for certain number ( eg 3. should be configurable) before falling back to another node.
3)circuit breaker - if for some reasons, all the 3 nodes of service is having some issue accessing db and throwing exceptions or not responding, the circuit should get open, fall back method called and circuit automatically closes after the services recovers.
Looking at many examples of Spring-cloud, I figured out
1) RestTemplate will help with option 1. but when RestTemplate access one instance of service and if the node fails, will it try with other two nodes?
2) Hystix will help with circuit breaker option (3 above). but if just one node is not responding, will it try other nodes, before opening up circuit and call fallback method. and will it automatically close circuit once the service recovers?
3) how to get retryLogic with spring-cloud? I do know about #Retryable annotation. But will it help in the following situation?
Retry with one node for 3 times and after it fails, try the next node 3 times and the last node 3 times before circuit breaker kicks in.
I see that all these configurations are available in spring cloud. but having a hard-time understanding how to configure for all these for efficient solution.
Here is one proposed:
#HystrixCommand
#Retryable
public Object doSomething() {
// use your RestTemplate here
}
But I don't totally know if it is going to help me with all the subtleties I mentioned above.
I do see there is a #FeignClient. But from this blog, I understand that it provides a high level feature for HTTP client requests. Does it help with retry and circuit breaker and load balancing all-in-one?
Thanks
I do see there is a #FeignClient. Does it help with retry and circuit breaker and load balancing all-in-one?
If you are using the full spring-cloud stack, it actually solves everything you mentioned.
The netflix components in this scenario are the following in spring-cloud:
Eureka - Service Registry
Let's you dyanmically register your services so you only need to fix one host in your app (eureka).
Ribbon - Load balancer
Out of the box it's providing you with round robin loadbalancing, but you can implement your own #RibbonClient (even for a specific service) and design your custom loadbalancing for example based on eureka metadata. The loadbalancing happens on the client side.
Feign - Http client
With #FeignClient you can rapidly develop clients for you other services (or services outside of your infrastructure). It is integrated with ribbon and eureka so you can refer to your services #FeignClient(yourServiceNameInEureka) and what you end up with is a client which loadbalances between the registered instances with your preferred logic. If you are using spring you can use the familiar #RequestMapping annotation to describe the endpoint you are using.
Hystrix - Circuit breaker
By default your feign clients will use hystrix, every request will be wrapped in a hystrix command. You can of course create hytrix commands by hand and configure them for your needs.
You have to configure a little to get thees working (actually just a few #Enable annotation on your configuration).
I highly recommend reading the provided spring documentation because it wraps up almost all of your aspects in a fairly quick read.
http://cloud.spring.io/spring-cloud-netflix/spring-cloud-netflix.html

Resources