Spring boot application stop serving traffic while Kafka consumer rebalancing - spring-boot

I'm running Spring boot applications in k8s cluster with Kafka.
during a rolling update or scaling my services, some of them rebalanced which is ok since consumers are being added or removed, but this causes the service whos rebalancing to stop serving traffic.
I'm using
Spring boot 2.1.1.RELEASE
Spring Integration Kafka 3.1.0.RELEASE
Spring Kafka 2.2.7.RELEASE
I have 3 topics each with 2000 partition, the services are 30-50 depending on the system load.
And using consumer groups for each topic.
First I thought that new services are signaling that they are ready (via Actuator readiness probe) which causes them to accept traffic before they are actually ready, but that's not the case since the existing ones also stop serving traffic while they rebalancing.
What's the best practices for scaling or rolling update which will trigger the minimum rebalancing possible

Boot 2.1 is end of life. The last release, last month, was 2.1.18. The current 2.2.x release of spring-kafka is 2.2.14.
If you can upgrade to (at least) Boot 2.2.11 (spring-kafka 2.4.11 - Boot brings in 2.3.x by default) (and a broker >= 2.3), you could consider configuring incremental cooperative rebalancing.
Current releases are Boot 2.4.0 and spring-kafka 2.6.3.
https://www.confluent.io/blog/incremental-cooperative-rebalancing-in-kafka/

Related

Avoid multiple listens to ActiveMQ topic with Spring Boot microservice instances

We have configured our ActiveMQ message broker as a Spring Boot project and there's another Spring Boot application (let's call it service-A) that has a listener configured to listen to some topics using #JmsListener annotation. It's a Spring Cloud microservice appilcation.
The problem:
It is possible that service-A can have multiple instances running.
If we have 2 instances running, then any message coming on topic gets listened to twice.
How can we avoid every instance listening to the topic?
We want to make sure that the topic is listened to only once no matte the number of service-A instances.
Is it possible to run the microservice in a cluster mode or something similar? I also checked out ActiveMQ virtual destinations but not too sure if that's the solution to the problem.
We have also thought of an approach where we can decide who's the leader node from the multiple instances, but that's the last resort and we are looking for a cleaner approach.
Any useful pointers, references are welcome.
What you really want is a shared topic subscription which was added in JMS 2. Unfortunately ActiveMQ 5.x doesn't support JMS 2. However, ActiveMQ Artemis does.
ActiveMQ Artemis is the next generation broker from ActiveMQ. It supports most of the same features as ActiveMQ 5.x (including full support for OpenWire clients) as well as many other features that 5.x doesn't support (e.g. JMS 2, shared-nothing high-availability using replication, last-value queues, ring queues, metrics plugins for integration with tools like Prometheus, duplicate message detection, etc.). Furthermore, ActiveMQ Artemis is built on a high-performance, non-blocking core which means scalability is much better as well.

Liveness/Readiness set of health indicators for Spring Boot service running on top of Kafka Streams

How health indicators should be properly configured for Spring Boot service running on top of Kafka Streams with DB connection? We use Spring Cloud Streams and Kafka Streams binding, Spring-Data JPA, Kubernetes as a container hypervisor. We have let say 3 service replicas and 9 partitions for each topic. A typical service usually joins messages from two topics and persist data in a database and publish data back to another kafka topic.
After switching to Spring Boot 2.3.1 and changing K8s liveness/readiness endpoints to the new ones:
/actuator/health/liveness
/actuator/health/readiness
we discovered that by default they do not have any health indicators included.
According to documentation:
Actuator configures the "liveness" and "readiness" probes as Health
Groups; this means that all the Health Groups features are available
for them. (...) By default, Spring Boot does not add other Health
Indicators to these groups.
I believe that this is the right approach, but I have not tested that:
management.endpoint.health.group.readiness.include: readinessState,db,binders
management.endpoint.health.group.liveness.include: livenessState,ping,diskSpace
We try to cover the following use cases:
rolling update: not available consumption slot (idle instance) when new replica is added
stream has died (runtime exception has been thrown)
DB is not available during container start up / when service is running
broker is not available
I have found a similar question, however I believe the current one is specifically related to Kafka services. They are different in it's nature from REST services.
Update:
In spring boot 2.3.1 binders health indicator checks if streams are in RUNNING or REBALANCING state for Kafka 2.5 (before only RUNNING), so I guess that rolling update case with idle instance is handled by its logic.

duplicate consumption of messages with Spring Cloud Stream Kafka binder

We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.
Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).
While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.
Any hints what might be the cause of duplication? What should I be looking for to figure this out?
We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).
You should show your configuration when asking questions like this.
Best guess is the broker's offsets.retention.minutes.
With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.

How to find no more messages in kafka topic/partition & reading only after writing to topic is done

I'm using Spring boot version 1.5.4.RELEASE & spring Kafka version 1.3.8.RELEASE.
Some generic questions
is there way to find out no more messages in topic/partition in
consumer
how to start consumer to start consuming messages from a
topic only after writing from the producer is done?
Spring Boot 1.5 is end of life and no longer supported; the current version is 2.2.5.
The latest 1.3.x version of Spring for Apache Kafka is 1.3.10. It will only be supported through the end of this year.
You should plan on upgrading.
You can start and stop containers using the KafkaListenerEndpointRegistry bean; set autoStartup to false on the container factory.
See Detecting Idle and Non-Responsive Consumers.
While efficient, one problem with asynchronous consumers is detecting when they are idle - users might want to take some action if no messages arrive for some period of time.
You can configure the listener container to publish a ListenerContainerIdleEvent when some time passes with no message delivery. While the container is idle, an event will be published every idleEventInterval milliseconds.
...

Spring cloud bus - rabbitmq unavailability marks the instance DOWN

I use spring cloud config bus (rabbitmq) in my micro-service. Only purpose for me to use rabbitmq in my microservice is spring cloud bus... I have 2 questions below.
When I was experimenting, I found that spring expects rabbitmq to be UP and running during application start. Which is contrary to what Spring cloud evangelises... (Circuit breakers...) To be fair, even service discovery is not expected to be up and running before starting an application. Is there any sensible reason behind this...?
Say, I start my application when rabbitmq is up and running. For some reason, rabbitmq goes down... What I should be losing is just my ability to work with rabbitmq... instead, /health endpoint responds back as DOWN for my micro-service. Any eureka instance listening to heart beats from my micro-service is also marking the instance as down. Any reasons for doing this...?
To my knowledge, this is against the circuit breaker pattern that spring cloud has evangelised.
I personally feel that spring cloud config bus is not an important feature to mark an application as down...
Is there any alternatives to tell my spring boot micro-service that connection to rabbitmq is not a critical service?
Thanks in advance!

Resources