Liveness/Readiness set of health indicators for Spring Boot service running on top of Kafka Streams - spring-boot

How health indicators should be properly configured for Spring Boot service running on top of Kafka Streams with DB connection? We use Spring Cloud Streams and Kafka Streams binding, Spring-Data JPA, Kubernetes as a container hypervisor. We have let say 3 service replicas and 9 partitions for each topic. A typical service usually joins messages from two topics and persist data in a database and publish data back to another kafka topic.
After switching to Spring Boot 2.3.1 and changing K8s liveness/readiness endpoints to the new ones:
/actuator/health/liveness
/actuator/health/readiness
we discovered that by default they do not have any health indicators included.
According to documentation:
Actuator configures the "liveness" and "readiness" probes as Health
Groups; this means that all the Health Groups features are available
for them. (...) By default, Spring Boot does not add other Health
Indicators to these groups.
I believe that this is the right approach, but I have not tested that:
management.endpoint.health.group.readiness.include: readinessState,db,binders
management.endpoint.health.group.liveness.include: livenessState,ping,diskSpace
We try to cover the following use cases:
rolling update: not available consumption slot (idle instance) when new replica is added
stream has died (runtime exception has been thrown)
DB is not available during container start up / when service is running
broker is not available
I have found a similar question, however I believe the current one is specifically related to Kafka services. They are different in it's nature from REST services.
Update:
In spring boot 2.3.1 binders health indicator checks if streams are in RUNNING or REBALANCING state for Kafka 2.5 (before only RUNNING), so I guess that rolling update case with idle instance is handled by its logic.

Related

Spring boot application stop serving traffic while Kafka consumer rebalancing

I'm running Spring boot applications in k8s cluster with Kafka.
during a rolling update or scaling my services, some of them rebalanced which is ok since consumers are being added or removed, but this causes the service whos rebalancing to stop serving traffic.
I'm using
Spring boot 2.1.1.RELEASE
Spring Integration Kafka 3.1.0.RELEASE
Spring Kafka 2.2.7.RELEASE
I have 3 topics each with 2000 partition, the services are 30-50 depending on the system load.
And using consumer groups for each topic.
First I thought that new services are signaling that they are ready (via Actuator readiness probe) which causes them to accept traffic before they are actually ready, but that's not the case since the existing ones also stop serving traffic while they rebalancing.
What's the best practices for scaling or rolling update which will trigger the minimum rebalancing possible
Boot 2.1 is end of life. The last release, last month, was 2.1.18. The current 2.2.x release of spring-kafka is 2.2.14.
If you can upgrade to (at least) Boot 2.2.11 (spring-kafka 2.4.11 - Boot brings in 2.3.x by default) (and a broker >= 2.3), you could consider configuring incremental cooperative rebalancing.
Current releases are Boot 2.4.0 and spring-kafka 2.6.3.
https://www.confluent.io/blog/incremental-cooperative-rebalancing-in-kafka/

How to find the processing time of Kafka messages?

I have an application running Kafka consumers and want to monitor the processing time of each message consumed from the topic. The application is a Spring boot application and exposes Kafka consumer metrics to Spring Actuator Prometheus endpoint using micrometre registry.
Can I use kafka_consumer_commit_latency_avg_seconds or kafka_consumer_commit_latency_max_seconds to monitor or alert?
Those metrics have nothing to do with record processing time. spring-kafka provides metrics for that; see here.
Monitoring Listener Performance
Starting with version 2.3, the listener container will automatically create and update Micrometer Timer s for the listener, if Micrometer is detected on the class path, and a single MeterRegistry is present in the application context. The timers can be disabled by setting the ContainerProperty micrometerEnabled to false.
Two timers are maintained - one for successful calls to the listener and one for >failures.

Avoid multiple listens to ActiveMQ topic with Spring Boot microservice instances

We have configured our ActiveMQ message broker as a Spring Boot project and there's another Spring Boot application (let's call it service-A) that has a listener configured to listen to some topics using #JmsListener annotation. It's a Spring Cloud microservice appilcation.
The problem:
It is possible that service-A can have multiple instances running.
If we have 2 instances running, then any message coming on topic gets listened to twice.
How can we avoid every instance listening to the topic?
We want to make sure that the topic is listened to only once no matte the number of service-A instances.
Is it possible to run the microservice in a cluster mode or something similar? I also checked out ActiveMQ virtual destinations but not too sure if that's the solution to the problem.
We have also thought of an approach where we can decide who's the leader node from the multiple instances, but that's the last resort and we are looking for a cleaner approach.
Any useful pointers, references are welcome.
What you really want is a shared topic subscription which was added in JMS 2. Unfortunately ActiveMQ 5.x doesn't support JMS 2. However, ActiveMQ Artemis does.
ActiveMQ Artemis is the next generation broker from ActiveMQ. It supports most of the same features as ActiveMQ 5.x (including full support for OpenWire clients) as well as many other features that 5.x doesn't support (e.g. JMS 2, shared-nothing high-availability using replication, last-value queues, ring queues, metrics plugins for integration with tools like Prometheus, duplicate message detection, etc.). Furthermore, ActiveMQ Artemis is built on a high-performance, non-blocking core which means scalability is much better as well.

How Spring BOOT Logger Actuator behaves in clustered environment?

I have a query related to Spring Boot Actuator. Through Actuator I can change the log level dynamically.
In clustered environment how it works?
If I do the REST (POST) call to change the log level then in which node it will be applied?
Or will it be applied to all the nodes?
If it gets applied to all the nodes in the cluster then how to restrict it to only a particular node?
You should use external configuration server (spring cloud config) and use spring cloud bus to reflect configuration changes into all the servers of your cluster.
Place your log configuration on the configuration server, on each change, a message will be sent to a message broker (like rabbitMq) to all the servers listening to the config.

Spring cloud bus - rabbitmq unavailability marks the instance DOWN

I use spring cloud config bus (rabbitmq) in my micro-service. Only purpose for me to use rabbitmq in my microservice is spring cloud bus... I have 2 questions below.
When I was experimenting, I found that spring expects rabbitmq to be UP and running during application start. Which is contrary to what Spring cloud evangelises... (Circuit breakers...) To be fair, even service discovery is not expected to be up and running before starting an application. Is there any sensible reason behind this...?
Say, I start my application when rabbitmq is up and running. For some reason, rabbitmq goes down... What I should be losing is just my ability to work with rabbitmq... instead, /health endpoint responds back as DOWN for my micro-service. Any eureka instance listening to heart beats from my micro-service is also marking the instance as down. Any reasons for doing this...?
To my knowledge, this is against the circuit breaker pattern that spring cloud has evangelised.
I personally feel that spring cloud config bus is not an important feature to mark an application as down...
Is there any alternatives to tell my spring boot micro-service that connection to rabbitmq is not a critical service?
Thanks in advance!

Resources