We have a situation where we have an abundance of Spring boot applications running in containers (on OpenShift) that access centralized infrastructure (external to the pod) such as databases, queues, etc.
If a piece of central infrastructure is down, the health check returns "unhealthy" (rightfully so). The problem is that the liveliness check sees this, and restarts the pod (the readiness check then sees it's down too, so won't start the app). This is fine when only a few are available, but if many (potentially hundreds) of applications are using this, it forces restarts on all of them (crash loop).
I understand that central infrastructure being down is a bad thing. It "should" never happen. But... if it does (Murphy's law), it throws containers into a frenzy. Just seems like we're either doing something wrong, or we should reconfigure something.
A couple questions:
If you are forced to use centralized infrastructure from a Spring boot app running in a container on OpenShift/Kubernetes, should all actuator checks still be enabled for backend? (bouncing the container really won't fix the backend being down anyway)
Should the /actuator/health endpoint be set for both the liveliness probe and the readiness probe?
What common settings do folk use for the readiness/liveliness probe in a spring boot app? (timeouts/interval/etc).
Using actuator checks for liveness/readiness is the de-facto way to check for healthy app in Spring Boot Pod. Your application, once up, should ideally not go down or become unhealthy if a central piece, such as DB or Queueing service goes down , ideally you should add some sort of a resiliency which will either connect to alternate DR site or wait for certain time period for central service to come back up and app to reconnect. This is more of a technical failure on backend side causing a functional failure of your Application after it was started up cleanly.
Yes , both liveness and readiness is required as they both serve different purposes. Read this
In one of my previous projects, the settings used for readiness was around 30 seconds and liveness at around 90, but to be honest this is completely dependent on your application , if your app takes 1 minute to start , that is what your readiness time should be configured at , and your liveness should factor in the same along with any time required for making failover switch of your backend services.
Hope this helps.
Related
I need to design a solution able to process avg 150k requests per day.
I was thinking to expose a REST API from a Spring Boot App running on AWS EKS (current tech approved from CTO), but I'm wondering about the limits.
Is there a knowledge base where I can read of any cap for such scenario (API limit for Spring boot app on EKS considering the pod replicas)?
If this would not work, how would you do that? Was thinking my customer can write on a Kafka queue where my Spring Boot app will read (streaming approach).
The goal is to take requests from my customer app and forward them to my backend system that is gonna do its processes.
I don't see any reason why it wouldn't work, EKS is a service that provides Kubernetes as a service, so as long as you get the required resources, the performance of the app depends directly on the application.
150K requests per day is around 1.7 requests per second, which is definitely a manageable volume (of course, depends on the logic your app has)
I have a spring batch application which is to be deployed in kubernetes. it doesn't include spring-boot-starter-web since it's just running cron jobs. is there any way to expose spring-actuator health end points without adding the starter-web dependency.
Batch applications are ephemeral by nature. It does not make sense to expose an actuator endpoint for the duration of the job (would the job still be running when you query the endopoint?). This has been discussed here: https://github.com/spring-projects/spring-boot/issues/21024.
Liveliness Probe and Readiness Probe for spring batch
Same here. A readiness probe is typically used to see if a service is ready to accept requests. I'm not sure if it makes sense to have a readiness probe for batch jobs. A liveliness probe however could make sense (to see if a job is "live", in which case one needs to define what "live" means), but I've never seen an implementation for such a probe in practice as there are other means to report if a job is running or not (such as live metrics with micrometer for instance).
According to the documentation:
An application is considered ready as soon as application and
command-line runners have been called, see Spring Boot application
lifecycle and related Application Events.
So it doesn't seem to be considering external systems suchs a database. Is this correct?
How could we make the ReadinessStateHealthIndicator evaluate the state of such systems so the pod is taken away from the k8s service load balancer when they are failing of are not available?
I have two docker instances that I launch with docker-compose.
One holds a Cassandra instance
One holds a Spring Boot application that tries to connect to that application.
However, the Spring Boot application will always fail, because it's trying to connect to a Cassandra instance that is not ready yet to take connections.
I have tried:
Using restart:always in Docker-compose
This still doesn't always work, because the Cassandra might be up 'enough' to no longer crash the Spring Boot application, but not up 'enough' to have successfully created the Table/Column family. On top of that, this is a very hacky solution.
Using healthcheck
It seems like healthcheck in compose doesn't have restart capabilities
Using a bash script as entrypoint
In the hope that I could use netstat,ping,... whatever to determine that readiness state of Cassandra
Right now the only thing that really works is using that same bash script and sleep the process for x seconds, then start the jar. This is even more hacky...
Does anyone have an idea on how to solve this?
Thanks!
Does the spring boot service defined in the docker-compose.yml depends_on the cassandara service? If yes then the service is started only if the cassandra service is ready.
https://docs.docker.com/compose/compose-file/#depends_on
Take a look at this github repository, to find a healthcheck for the cassandra service.
https://github.com/docker-library/healthcheck
CONCLUSION
After some discussion we found out that docker-compose seems not to provide a functionality for waiting until services are up and healthy, such as Kubernetes and Openshift provide (See comments below). They recommend to use wrapper script (docker-entrypoint.sh) which waits for the depending service to come up, which make binaries necessary, the actual service shouldn't use such as the cassandra client binary. Additionally the service depending on cassandra could never get up if cassandra doesn't, which shouldn't happen.
A main thing with microservices is that they have to be resilient for failures and are not supposed to die or not to come up if a depending service is currently not available or unexpectedly disappears. Therefore the microservice should be implemented in a way so that it retries to get connection after startup or an unexpected disappearance. Unexpected is a word actually wrongly used in this context, because you should always expect such issues in a distributed environment, and even with docker-compose you will face issues like that as discussed in this topic.
The following link points to a tutorial which helped to integrate cassandra properly into a spring boot application. It provides a way to implement the retrieval of a cassandra connection with a retry behavior, therefore the service is resilient to a non existing cassandra database and will not fail to start anymore. Hope this helps others as well.
https://dzone.com/articles/containerising-a-spring-data-cassandra-application
I understand that Spring Boot has a built-in Tomcat server (or Jetty) which facilitates rapid development. But what do you do when you need to scale out your application because traffic has increased?
As pointed out in the comments, there is no silver bullet here, it depends on your infrastructure and there are several tools out there to help you, you only need to choose what works best for you.
For load balancing you can either choose something like an Nginx or leave it to spring cloud which also has a lot of other handy features for scaling/clustering.
Scaling shouldn't be very hard because spring boot runs on it's own server.
Some tools that help with scaling/clustering:
Spring boot app:
If you are going to scale, your app has to be near-stateless (e.g: you cannot have a scheduled task or something like that because when you scale to x instances, they are executed x times).
You can use the spring cloud project for extra added features like service discovery and other goodies that make scaling easier (e.g: When you spin up a new instance, it can get the config easily from a config server, 'register' to ease the loadbalancing between services, have cluster-like behaviour, etc...).
Infrastructure and containers:
Docker is a no-brainer here to handle easy launching of your applications and their replicas, if needed. If you can go further with resources and go with Kubernetes but it all depends on the use case.
Various servers (nodes), in case one of them fails and to easily distribute loads.
Ngnix for load balancing is pretty straightforward if you already don't have something done with spring cloud.
Database:
You really do NOT want to go with MySQL here because it can not scale well as your spring apps. You can choose something like Cassandra or Redis but that would mean restructuring your data model. Maybe the least-painful transition from MySQL to something NoSQL that can scale is a MongoDB (imho: Cassandra performs better).
Logging:
This can be a nightmare but spring also has a solution for this. Check out zipkin and spring sleuth.
Also, there are a lot resources here that talk a lot about architecture in general and how it is necessary to change the mindset when trying to run distributed services.
Hope this helps.
Update 2021-02-23
Today, Kubernetes is pretty much a de-facto standard when we talk about scaling and is preferred because of the rich set of features that you will be able to leverage and focus your app purely on business domain logic and can remove things like spring cloud for service discovery. If you can use some public clouds like EKS and GKE, you are better off without having to manage the clusters by yourself.
It provides autoscaling and built-in healthchecks. Starting from Spring Boot 2.4, you have many added benefits for running Spring Boot on K8s like dedicated healthcheck endpoints for liveness and readiness probes, graceful shutdown, etc....
On the database side, aim for something that is managed and scales easily such as AWS Aurora or similar.
An important thing to mention when managing spring boot services at scale is probably configuration management. A very useful solution that you can use out of the box is Consul. This will enable you to hot reload the configuration which is important when you have 50 services that you need to restart only to change one boolean variable. Depending on how big is your application, the startup can be costly, in terms of time as well as CPU/memory resources