Spring Kafka - increase in concurrency produces more duplicate

Spring Kafka - increase in concurrency produces more duplicate - spring

Topic A created with 12 partition ,
And spring Kafka consumer started with 10 concurrency as spring boot application. While putting 1000 message into topic , there is no issues with duplicate as all 1000 messages got consumed but on pushing more load 10K messages 100TPS , in consumer end received 10K + 8.5K messages as duplicate with 10 concurrency but for one concurrency it's working fine ( No duplicates found )
Enable auto commit is false and doing manual ack after processing the message .
Processing time for one message is 300 milli second
Consumer rebalance occuring because of that duplicates got produced.
How to overcome this situation when we are handling more message in Kafka ?

Related

How to limit Message consumption rate of Kafka Consumer in SpringBoot? (Kafka Stream)

I want to limit my Kafka Consumer message consumption rate to 1 Message per 10 seconds .I'm using kafka streams in Spring boot .
Following is the property I tried to Make this work but it didn't worked out s expected(Consumed many messages at once).
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokersUrl);
config.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
//
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,1);
config.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 10000);
is there any way to Manually ACK(Manual offsetCommits) in KafkaStreams? which will be usefull to control the msg consumption rate .
Please note that i'm using Kstreams(KafkaStreams)
Any help is really appreciated . :)

I think you misunderstand what MAX_POLL_INTERVAL_MS_CONFIG actually does.
That is the max allowed time for the client to read an event.
From docs
controls the maximum time between poll invocations before the consumer will proactively leave the group (5 minutes by default). The value of the configuration request.timeout.ms (default to 30 seconds) must always be smaller than max.poll.interval.ms(default to 5 minutes), since that is the maximum time that a JoinGroup request can block on the server while the consumer is rebalance
"maximum time" not saying any "delay" between poll invocations.
Kafka Streams will constantly poll; you cannot easily pause/start it and delay record polling.
To read an event every 10 seconds without losing consumers in the group due to lost heartbeats, then you should use Consumer API, with pause() method, call Thread.sleep(Duration.ofSeconds(10)), then resume() + poll() while setting max.poll.records=1

Finally ,I achieved the desired message consuming limit using Thread.sleep().
Since , there is no way to control the message consumption rate using kafka config properties itself . I had to use my application code to control the rate of consumption .
Example: if I want control the record consumption rate say 4 msg per 10 seconds . Then i will just consumer 4 msg (will keep a count parallely) once 4 records are consumer then i will make the thread sleep for 10 seconds and will repeat the same process over again .
I know it's not a good solution but there was no other way.
thank you OneCricketeer

Spring cloud stream - Kafka consumer consuming duplicate messages with StreamListener

With our spring boot app, we notice kafka consumer consuming message twice randomly once in a while only in prod env. We have 6 instances with 6 partitions deployed in PCF.We have caught messages with same offset and partition received twice in same topic which causes duplicates and is a business critical for us.
We haven't noticed this in non production env and it is hard to reproduce in non prod env. We have recently switched to Kafka and we are not able to find out the root issue.
We are using spring-cloud-stream/spring-cloud-stream-binder-kafka- 2.1.2
Here is the Config:
spring:
cloud:
stream:
default.consumer.concurrency: 1
default-binder: kafka
bindings:
channel:
destination: topic
content_type: application/json
autoCreateTopics: false
group: group
consumer:
maxAttempts: 1
kafka:
binder:
autoCreateTopics: false
autoAddPartitions: false
brokers: brokers list
bindings:
channel:
consumer:
autoCommitOnError: true
autoCommitOffset: true
configuration:
max.poll.interval.ms: 1000000
max.poll.records: 1
group.id: group
We use #Streamlisteners to consume the messages.
Here is the instance we received duplicate and the error message captured in server logs.
ERROR 46 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator
: [Consumer clientId=consumer-3, groupId=group] Offset commit failed
on partition topic-0 at offset 1291358: The coordinator is not aware
of this member. ERROR 46 --- [container-0-C-1]
o.s.kafka.listener.LoggingErrorHandler : Error while processing:
null OUT org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and
assigned the partitions to another member. This means that the time
between subsequent calls to poll() was longer than the configured
max.poll.interval.ms, which typically implies that the poll loop is
spending too much time message processing. You can address this
either by increasing the session timeout or by reducing the maximum
size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:871)
~[kafka-clients-2.0.1.jar!/:na]
There is no crash and all the instances are healthy at the time of duplicate. Also there is confusion with error log - Error while processing: null since Message was successfully processed twice. And max.poll.interval.ms: 100000 which is about 16 minutes and it is supposed to be enough time to process any message for the system and session timeout and heartbit config is default. Duplicate is received within 2 seconds in most of the instances.
Any configs that we are missing ? Any suggestion/help is highly appreciated.

Commit cannot be completed since the group has already rebalanced
A rebalance occurred because your listener took too long; you should adjust max.poll.records and max.poll.interval.ms to make sure you can always handle the records received within the timelimit.
In any case, Kafka does not guarantee exactly once delivery, only at least once delivery. You need to add idempotency to your application and detect/ignore duplicates.

Also, keep in mind StreamListener and the annotation-based programming model has been deprecated for 3+ years and has been removed from the current main, which means the next release will not have it. Please migrate your solution to a functional based programming model

concurrentmessagelistenercontainer spring kafka is not working

Topic A create with 12 partitions
And in spring Kafka concurrency set as 4 . And can able to view 4 client I'd assigned to 12 partitions ( 3 each )
Containers as well created for 4 concurrency but while consuming the data from topic in listeners , it consumes sequentially but not in parallel.
Example :
Consumer 1-C completes processing the data then
Consumer 2-C starts then it completes then
Consumer 3-C starts then it completes
Then
Consumer 4-C ....
Consumer 1-C
But rather I want like
Consumer 1-C Consumer 2-C Consumer 3-C Consumer 4-C to consume data in parallel

Check the following github issue and compare to your code
https://github.com/spring-projects/spring-kafka/issues/247

Why did Kafka Consumer re-processed all the records since last 2 months?

In one of the instances, when the consumer service was restarted, it leads to re-processing all the records which were sent to Kafka.
Kafka Broker: 0.10.0.1
Kafka producer Service: Springboot version 1.4.3.Release
Kafka Consumer Springboot Service: Springboot version 2.2.0.Release
Now for investigating this issue, I want to recreate this scenario again in the dev/local environment which is not happening!!!
What can be the probable cause?
How to check, if the record once is processed from Consumer Side is committed when we send Acknowledgement.acknowledge();
Consumer - properties
Enable Auto commit = false;
Auto offset Reset = earliest;
max poll records = 1;
max poll interval ms config = I am calculating the value of this parameter at runtime from the formula ==>> (number_of_retries * x * 2) <= INTEGER.MaxValue
Retry Policy - Simple
number of retries = 3;
interval between retries = x (millis)
I am creating Topics at runtime on consumer side via beans NewTopic(topic_name, 1, (**short**)1)
There are 2 Kafka clusters and 1 zookeeper instances running.
Any help would be appreciated

That broker is very old; if a consumer receives no records for 24 hours, the offsets were removed and restarting the consumer would cause it to reprocess all the records.
With newer brokers, it was changed to 7 days and the consumer has to be stopped for 7 days for the offsets to be removed.
Spring Boot 1.4.x (and even 1.5.x, 2.0.x) is no longer supported; the current version is 2.3.1.
You should upgrade to a newer broker and a more recent Spring Boot version.

Spring kafka consumer stops receiving message

I have a spring microservice using kafka.
Here are consumer 5 config properties :
BOOTSTRAP_SERVERS_CONFIG -> <ip>:9092
KEY_DESERIALIZER_CLASS_CONFIG -> StringDeserializer.class
VALUE_DESERIALIZER_CLASS_CONFIG -> StringDeserializer.class
GROUP_ID_CONFIG -> "Group1"
MAX_POLL_INTERVAL_MS_CONFIG -> Integer.INT_MAX
It has been observed that when microservice is restarted , then kafka consumer stops receiving messages. Please help me in this.

I believe your max.poll.interval.ms is the issue. It is set to 24days!! This represents the time the consumer is given to process the message. The the broker will hang for that long when the processing thread dies! Try setting it to a smaller value than Integer.INT_MAX, for example 30 seconds 30000ms.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio