Sprint Cloud Stream Kafka Streams Binder processor application stuck - apache-kafka-streams

I have the following Spring Cloud Stream Kafka Streams Binder 3.x application:
When I run X messages through this application by publishing them to the topic1 from an integration test using #SpringBootTest and #EmbeddedKafka the counts of messages at points 1 and 2 are equal, as I expect.
When I do the same using live application connected to the Kafka broker, the counts at point 1 and point 2 remain significantly different: Count1 >> Count2.
Kafka Tool shows a big Lag of the Processor2 consumer on the topic2 and that lag remains constant (doesn't change after I stop publishing messages)
The Processor2 consists of
flatTransform stateful transformer
aggregator
other downstream steps
What could be the reason of the distinct behaviour during test and live mode and Lag not going down in live mode?
I have thoroughly compared all application property values active in test and in live application, they are exactly equivalent.
There is only 1 partition in all topics in both cases.

In my case the reason was default 7 days retention setting of the topics that were automatically created by the Spring Cloud Stream application.
The messages in my input stream span 8 years, I am using custom TimestampExtractor.
After I have manually configured topics to a large retention time, the issue was solved:
/usr/bin/kafka-configs --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name topic2 --add-config retention.hours=87600
Or set the log.retention.hours for the entire Kafka broker.

Related

Consumer issues Spring Cloud Rabbit Stream - Cloud Foundry

I have a spring cloud dataflow stream deployed in PCF using rabbit as the binder. I have multiple processors in the pipeline. Occasionally I see issues wherein a partitioned consumer does not consume messages from Rabbit until the consumer is restarted. For instance, within my stream, I have a processor that has partitioned input. The processor, foo, has 10 partitions. All partitions consume messages without issues 99% of the time. At rare occasions, one partition is not drained. When the instance listening to the partition is terminated and recreated, all works well again. Is there a mechanism to capture these issues? Will listening to ListenerContainerConsumerFailedEvent help in detecting such issues? Is there a preferred way to recover from such issues?
Sample stream definition is as follows:-
Source | foo | bar | Sink
Deployment Properties:-
app.Source.spring.cloud.stream.bindings.output.producer.partition-key-expression=headers['PARTITION_KEY']
app.Source.spring.cloud.stream.bindings.output.producer.partition-count=10
app.foo.spring.cloud.stream.bindings.input.consumer.partitioned=true
app.foo.spring.cloud.stream.instanceCount=10
deployer.foo.count=10

intermittent issue with kafka (aws msk) consumer

We are facing a strange issue in only one of our environment (with same consumer app).
Basically, it is observed that suddenly a lag starts to build up with only one of the topics on kafka broker (it has multiple topics), with 10 consumer members under a single consumer group.
Even after multiple restarts, adding another pod of consumer application, changing defaults configuration properties (max poll records, session timeout) so far have NOT helped much.
Looking for any suggestions, advice on how to possibly debug the issue (we tried enabling apache logs, cloud watch etc, but so we only saw that regular/periodic rebalancing is happening, even for very low load of 7k messages waiting for processing).
Below are env details:
App - Spring boot app on version 2.7.2 Platform
AWS Kafka - MSK
Kafka Broker - 3 brokers (version 2.8.x)
Consumer Group - 1 with 15 members (partition 8, Topic 1)

Is there any configuartion available to clear uncommitted message during kafka or consumer restart?

I have a business scenario,where the consumers should not consume the committed/uncommitted messages from topic
when consumer or kafka restart.I tried applying auto.offset.reset: latest.But its pulling the uncommitted offsets from topic.For e.g. having an application with one instance with 1 topic and 1 partition.Suppose I posted 10 messages,the consumer picked 5 messages and committed the offset.Now I restarting either my consumer instance /kafka.After restart it should not pick the old 5 messages which was not committed.Looking for any other configuration or workarounds.
Use a unique group.id (e.g. a UUID) each time you start, or seekToEnd each assigned partition during startup.
See Seeking to a Specific Offset.
You need to ensure, that your consumer gets a new Consumer Group (In Java API the consumer config for this is called group.id) every time you restart your application. Even if you restart your broker, you would still restart your application with a new group.id. And keep the configuration auto.offset.reset=latest.
Another option would be to manually change the offsets of the Consumer Group after every broker restart. Kafka comes with a ConsumerGroupCommand tool. You can find some information in the Kafka documentation Managing Consumer Groups.
If you plan to reset a particular Consumer Group ("myConsumerGroup") you can use
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group myConsumerGroup --topic topic1 --to-latest
Depending on your requirement you can reset the offsets for each partition of the topic with that tool. The help function or documentation explain the options.

duplicate consumption of messages with Spring Cloud Stream Kafka binder

We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.
Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).
While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.
Any hints what might be the cause of duplication? What should I be looking for to figure this out?
We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).
You should show your configuration when asking questions like this.
Best guess is the broker's offsets.retention.minutes.
With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.

messages published to all consumers with same consumer-group in spring-data-stream project

I got my zookeeper and 3 kafka broker running locally.
I started one producer and one consumer. I can see consumer is consuming message.
I then started three consumers with same consumer group name (different ports since its a spring boot project). but what I found is that all the consumers are now consuming (receiving) messages. But I expect the message to be load-balanced in that only messages are not repeated across the consumers. I don't know what the problem is.
Here is my property file
spring.cloud.stream.bindings.input.destination=timerTopicLocal
spring.cloud.stream.kafka.binder.zkNodes=localhost
spring.cloud.stream.kafka.binder.brokers=localhost
spring.cloud.stream.bindings.input.group=timerGroup
Here the group is timerGroup.
consumer code : https://github.com/codecentric/edmp-sample-stream-sink
producer code : https://github.com/codecentric/edmp-sample-stream-source
Can you please update dependencies to Camden.RELEASE (and start using Kafka 0.9+) ? In Brixton.RELEASE, Kafka consumers were 0.8-based and required passing instanceIndex/instanceCount as properties in order to distribute partitions correctly.
In Camden.RELEASE we are using the Kafka 0.9+ consumer client, which does load-balancing in the way you are expecting (we also support static partition allocation via instanceIndex/instanceCount, but I suspect this is not what you want). I can enter into more details on how to configure this with Brixton, but I guess an upgrade should be a much easier path.

Resources