Is there any way to support multiple health endpoints on a Spring Boot application?
Here's why: The standard actuator health check is great, built in checks are great, customization options are great - for a single use case: reporting on general application health.
But I'd like something I can call from an AWS Elastic Load Balancer / AutoScaling Group. By default, if an instance fails a health check, the ELB/ASG will terminate it and replace it with a fresh instance. The problem is some of the health checks, like DataSourceHealthIndicator, will report DOWN if the database is down, but my application instance is otherwise perfectly healthy. If I use the default behavior, AWS will throw out perfectly healthy instances until the database comes back up, and this will run up my bill.
I could get rid of the DataSourceHealthIndicator, but I like having it around for general health checking purposes. So what I really want is two separate endpoints for two different purposes, such as:
/health - General application health
/ec2Health - Ignores aspects unrelated to the EC2 instance, such as a DB outage.
Hope that makes sense.
Spring Boot Actuator has a feature called Health Groups which allows you to configure multiple health indicators.
In application.properties you configure the groups that you want:
management.endpoints.web.path-mapping.health=probes
management.endpoint.health.group.health.include=*
management.endpoint.health.group.health.show-details=never
management.endpoint.health.group.detail.include=*
management.endpoint.health.group.detail.show-details=always
management.endpoint.health.group.other.include=diskSpace,ping
management.endpoint.health.group.other.show-details=always
Output:
$ curl http://localhost:8080/actuator/probes
{"status":"UP","groups":["detail","health","other"]}
$ curl http://localhost:8080/actuator/probes/health
{"status":"UP"}
$ curl http://localhost:8080/actuator/probes/detail
{"status":"UP","components":{"diskSpace":{"status":"UP","details":{"total":0,"free":0,"threshold":0,"exists":true}},"ping":{"status":"UP"},"rabbit":{"status":"UP","details":{"version":"3.6.16"}}}}
$ curl http://localhost:8080/actuator/probes/other
{"status":"UP","components":{"diskSpace":{"status":"UP","details":{"total":0,"free":0,"threshold":0,"exists":true}},"ping":{"status":"UP"}}}
Related
I'd like to have a new on-demand health check endpoint for my service through implementation of indicator. The problem was the on-demand one would be called by default `/actuator/health`, so I have split the default health endpoint into two health groups `/actuator/health/default & /actuator/health/on-demand` as I didn't find any way to remove the on-demand directly from `/actuator/health`.
Now a new issue emerged, by default, spring boot admin will hit /actuator/health to get corresponding info, I was wondering it's possible to ask him to hit /actuator/health/default instead?
BTW, I only have admin client, without any recovery service
haha, this config is the answer: spring.boot.admin.client.instance.health-url
First time I'm trying out the actuator dependency, I configured it to be opt-in
management.endpoints.enabled-by-default=false
management.endpoint.info.enabled=true
management.endpoint.health.enabled=true
When I call /actuator/health locally it takes around 1.4 seconds to respond. Keep in mind this is a local call, from the same machine of the server.
If I create a regular endpoint that replies with an empty response, the request would take just a couple of milliseconds.
Is this normal? Can I make it reply faster?
Original Answer: https://stackoverflow.com/a/63666118/1861769
Basically health endpoint is implemented in a way that it contains a
list of all Spring beans that implement the interface HealthIndicator.
Each health indicator is responsible for supplying a health
information about one subsystem (examples of such subsystem are:disk,
postgres, mongo, etc.), spring boot comes with some predefined
HealthIndicators.
So that when the health endpoint is invoked, it iterates through this
list and gets the information about each subsystem and then constructs
the answer.
Hence you can place a break point in relevant health indicators
(assuming you know which subsystems are checked) and see what happens.
If you're looking for the HTTP entry point - the code that gets called
when you call http:///actuator/health (can vary depending
on your settings but you get the idea)`, it can be found here
Yet another approach that comes to mind is disabling "suspicious"
health check and finding the slow one by elimination.
For example, if you have an elastricsearch and would like to disable
it, use in the application.properties:
management.health.elasticsearch.enabled = false
Up to spring-boot 2.1.9, I used to set management.health.defaults.enabled = false to decouple the /health endpoint overall status from the database status.
As of 2.2.0, that specific setting no longer works that way (see: SpringBoot 2.1.9 -> 2.2.0 - health endpoint no longer works).
Is there a way to configure spring-boot to decouple the overall status of the /health endpoint from whether or not the datasource is up?
I'm inclined to just make my own endpoint hardcoded to return a status of 200.
I don't really understand what you're trying to do and how disabling all defaults achieved what you've described.
What would be the point of having an endpoint that returns 200 unconditionally? That's seriously misleading IMO.
If you do not want the datasource health indicator, then you can disable that (and only that) using management.health.db.enabled=false.
If you want the datasource health check but want to be able to ignore it, create a group that exclude the db health check and use that for monitoring. See the documentation for more details
I have two docker instances that I launch with docker-compose.
One holds a Cassandra instance
One holds a Spring Boot application that tries to connect to that application.
However, the Spring Boot application will always fail, because it's trying to connect to a Cassandra instance that is not ready yet to take connections.
I have tried:
Using restart:always in Docker-compose
This still doesn't always work, because the Cassandra might be up 'enough' to no longer crash the Spring Boot application, but not up 'enough' to have successfully created the Table/Column family. On top of that, this is a very hacky solution.
Using healthcheck
It seems like healthcheck in compose doesn't have restart capabilities
Using a bash script as entrypoint
In the hope that I could use netstat,ping,... whatever to determine that readiness state of Cassandra
Right now the only thing that really works is using that same bash script and sleep the process for x seconds, then start the jar. This is even more hacky...
Does anyone have an idea on how to solve this?
Thanks!
Does the spring boot service defined in the docker-compose.yml depends_on the cassandara service? If yes then the service is started only if the cassandra service is ready.
https://docs.docker.com/compose/compose-file/#depends_on
Take a look at this github repository, to find a healthcheck for the cassandra service.
https://github.com/docker-library/healthcheck
CONCLUSION
After some discussion we found out that docker-compose seems not to provide a functionality for waiting until services are up and healthy, such as Kubernetes and Openshift provide (See comments below). They recommend to use wrapper script (docker-entrypoint.sh) which waits for the depending service to come up, which make binaries necessary, the actual service shouldn't use such as the cassandra client binary. Additionally the service depending on cassandra could never get up if cassandra doesn't, which shouldn't happen.
A main thing with microservices is that they have to be resilient for failures and are not supposed to die or not to come up if a depending service is currently not available or unexpectedly disappears. Therefore the microservice should be implemented in a way so that it retries to get connection after startup or an unexpected disappearance. Unexpected is a word actually wrongly used in this context, because you should always expect such issues in a distributed environment, and even with docker-compose you will face issues like that as discussed in this topic.
The following link points to a tutorial which helped to integrate cassandra properly into a spring boot application. It provides a way to implement the retrieval of a cassandra connection with a retry behavior, therefore the service is resilient to a non existing cassandra database and will not fail to start anymore. Hope this helps others as well.
https://dzone.com/articles/containerising-a-spring-data-cassandra-application
We have a situation where we have an abundance of Spring boot applications running in containers (on OpenShift) that access centralized infrastructure (external to the pod) such as databases, queues, etc.
If a piece of central infrastructure is down, the health check returns "unhealthy" (rightfully so). The problem is that the liveliness check sees this, and restarts the pod (the readiness check then sees it's down too, so won't start the app). This is fine when only a few are available, but if many (potentially hundreds) of applications are using this, it forces restarts on all of them (crash loop).
I understand that central infrastructure being down is a bad thing. It "should" never happen. But... if it does (Murphy's law), it throws containers into a frenzy. Just seems like we're either doing something wrong, or we should reconfigure something.
A couple questions:
If you are forced to use centralized infrastructure from a Spring boot app running in a container on OpenShift/Kubernetes, should all actuator checks still be enabled for backend? (bouncing the container really won't fix the backend being down anyway)
Should the /actuator/health endpoint be set for both the liveliness probe and the readiness probe?
What common settings do folk use for the readiness/liveliness probe in a spring boot app? (timeouts/interval/etc).
Using actuator checks for liveness/readiness is the de-facto way to check for healthy app in Spring Boot Pod. Your application, once up, should ideally not go down or become unhealthy if a central piece, such as DB or Queueing service goes down , ideally you should add some sort of a resiliency which will either connect to alternate DR site or wait for certain time period for central service to come back up and app to reconnect. This is more of a technical failure on backend side causing a functional failure of your Application after it was started up cleanly.
Yes , both liveness and readiness is required as they both serve different purposes. Read this
In one of my previous projects, the settings used for readiness was around 30 seconds and liveness at around 90, but to be honest this is completely dependent on your application , if your app takes 1 minute to start , that is what your readiness time should be configured at , and your liveness should factor in the same along with any time required for making failover switch of your backend services.
Hope this helps.