Multiple ProducerIds are created when there are multiple instances of Producers - spring-boot

In the .yaml file, we have set
spring.cloud.stream.kafka.binder.configuration.enable.idempotence as true.
Now when the application starts up, we can see a log like
[kafka-producer-network-thread | test_clientId] org.apache.kafka.clients.producer.internals.TransactionManager - [Producer clientId=test_clientId] ProducerId set to 0 with epoch 0
When the first message is being produced to the topic, we can see that another ProducerId is being used as shown in the below log
[Ljava.lang.String;#720a86ef.container-0-C-1] org.apache.kafka.clients.producer.KafkaProducer - [Producer clientId=test_clientId] Instantiated an idempotent producer.
[Ljava.lang.String;#720a86ef.container-0-C-1] org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.0.1
[Ljava.lang.String;#720a86ef.container-0-C-1] org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : fa14705e51bd2ce5
kafka-producer-network-thread | test_clientId] org.apache.kafka.clients.Metadata - Cluster ID: -9nblycHSsiksLIUbVH6Vw
-9nblycHSsiksLIUbVH6Vw
1512361 INFO [kafka-producer-network-thread | test_clientId] org.apache.kafka.clients.producer.internals.TransactionManager - [Producer clientId=test_clientId] ProducerId set to 1 with epoch 0
Once the ProducerId is set to 1, when any new messages are sent from this application, no new ProducerIds are created.
But if we have multiple applications running(all connecting to same kafka server),
then new ProducerIds are created in that instance also while starting up as well as while sending the first message.
Please suggest if we can restrict creating of new ProducerIds and use the same one that was created while creating the application.
Also, since a lot of ProducerIds are created, is there some way in which we can re-use the already created ones?(Assuming the application has multiple producers and each one creates multiple ProducerIds)

The first producer is temporary - it is created to find the existing partitions for the topic during initialization. It is immediately closed.
The second producer is a single producer used for subsequent record sends.
The producerId and epoch are allocated by the broker. They have to be unique.
With a new broker you will get 0 and 1 for the first instance, 2 and 3 for the second instance, 4 and 5, ...
Even if you stop all instances, the next one will get 7 and 8.
Why do you worry about this?
On the other hand, if you set the client.id to, say foo, you will always get foo-1 and foo-2 on all instances.

Related

Invalid state: The Flow Controller is initializing the Data Flow

I'm trying out a test scenario to add a new node to the already existing cluster (for now 1-node) using external zookeeper.
I'm constantly getting the below repeated lines, and on UI "Invalid state: The Flow Controller is initializing the Data Flow."
2022-02-28 17:51:29,668 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at nifi-02:9489; will use this address for sending heartbeat messages
2022-02-28 17:51:29,668 INFO [main] o.a.n.c.p.AbstractNodeProtocolSender Cluster Coordinator is located at nifi-02:9489. Will send Cluster Connection Request to this address
2022-02-28 17:51:37,572 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
2022-02-28 17:52:36,914 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#13c90c06 checkpointed with 1 Records and 0 Swap Files in 4 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 1 millis), max Transaction ID 1
2022-02-28 17:52:37,581 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
NiFi-1.15.3 is being used (unsecure setup)
It seems that cluster coordinator is not running on mentioned port for node already in cluster. This I thought from timeout prospective, but new node is able to detect that a cluster coordinator is present at the mentioned node. (How to solve this?)
nc (netcat) is also timing out for the same port

Spring cloud stream - Kafka consumer consuming duplicate messages with StreamListener

With our spring boot app, we notice kafka consumer consuming message twice randomly once in a while only in prod env. We have 6 instances with 6 partitions deployed in PCF.We have caught messages with same offset and partition received twice in same topic which causes duplicates and is a business critical for us.
We haven't noticed this in non production env and it is hard to reproduce in non prod env. We have recently switched to Kafka and we are not able to find out the root issue.
We are using spring-cloud-stream/spring-cloud-stream-binder-kafka- 2.1.2
Here is the Config:
spring:
cloud:
stream:
default.consumer.concurrency: 1
default-binder: kafka
bindings:
channel:
destination: topic
content_type: application/json
autoCreateTopics: false
group: group
consumer:
maxAttempts: 1
kafka:
binder:
autoCreateTopics: false
autoAddPartitions: false
brokers: brokers list
bindings:
channel:
consumer:
autoCommitOnError: true
autoCommitOffset: true
configuration:
max.poll.interval.ms: 1000000
max.poll.records: 1
group.id: group
We use #Streamlisteners to consume the messages.
Here is the instance we received duplicate and the error message captured in server logs.
ERROR 46 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator
: [Consumer clientId=consumer-3, groupId=group] Offset commit failed
on partition topic-0 at offset 1291358: The coordinator is not aware
of this member. ERROR 46 --- [container-0-C-1]
o.s.kafka.listener.LoggingErrorHandler : Error while processing:
null OUT org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and
assigned the partitions to another member. This means that the time
between subsequent calls to poll() was longer than the configured
max.poll.interval.ms, which typically implies that the poll loop is
spending too much time message processing. You can address this
either by increasing the session timeout or by reducing the maximum
size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:871)
~[kafka-clients-2.0.1.jar!/:na]
There is no crash and all the instances are healthy at the time of duplicate. Also there is confusion with error log - Error while processing: null since Message was successfully processed twice. And max.poll.interval.ms: 100000 which is about 16 minutes and it is supposed to be enough time to process any message for the system and session timeout and heartbit config is default. Duplicate is received within 2 seconds in most of the instances.
Any configs that we are missing ? Any suggestion/help is highly appreciated.
Commit cannot be completed since the group has already rebalanced
A rebalance occurred because your listener took too long; you should adjust max.poll.records and max.poll.interval.ms to make sure you can always handle the records received within the timelimit.
In any case, Kafka does not guarantee exactly once delivery, only at least once delivery. You need to add idempotency to your application and detect/ignore duplicates.
Also, keep in mind StreamListener and the annotation-based programming model has been deprecated for 3+ years and has been removed from the current main, which means the next release will not have it. Please migrate your solution to a functional based programming model

How to stop Preparing to rebalance group with old generation in Kafka?

I used Kafka for my web application and I found the below messages in kafka.log :
[2021-07-06 08:49:03,658] INFO [GroupCoordinator 0]: Preparing to rebalance group qpcengine-group in state PreparingRebalance with old generation 105 (__consumer_offsets-28) (reason: removing member consumer-1-7eafeb56-e6fe-4161-9c88-e69c06a0ab37 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2021-07-06 08:49:03,658] INFO [GroupCoordinator 0]: Group qpcengine-group with generation 106 is now empty (__consumer_offsets-28) (kafka.coordinator.group.GroupCoordinator)
But, kafka like as looping forever for one consumer.
How can I stop it?
Here the picture of the kafka log :
enter image description here
If you only have one partition,you dont'need to use consumer_group
just try to use Assign(not subscribe)

kafka-streams instance on startup continuously logs "Found no committed offset for partition traces-1"

I have a kafka-streams app with 2 instances. This is a brand new kafka-cluster with all topics created and have no messages written to them yet.
I start the first instance and see that it has transitioned from REBALANCING to RUNNING state
Now I start the next instance and notice that it continuously logs the following:
2020-01-14 18:03:57.896 [streaming-app-f2457059-c9ec-4c21-a177-be54f8d59cb2-StreamThread-2] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=streaming-app-f2457059-c9ec-4c21-a177-be54f8d59cb2-StreamThread-2-consumer, groupId=streaming-app] Found no committed offset for partition traces-1

UNKNOWN_PRODUCER_ID When using apache kafka streams (scala)

I am running 3 instances of a service that I wrote using:
Scala 2.11.12
kafkaStreams 1.1.0
kafkaStreamsScala 0.2.1 (by lightbend)
The service uses Kafka streams with the following topology (high level):
InputTopic
Parse to known Type
Clear messages that the parsing failed on
split every single message to 6 new messages
on each message run: map.groupByKey.reduce(with local store).toStream.to
Everything works as expected but i can't get rid of a WARN message that keeps showing:
15:46:00.065 [kafka-producer-network-thread | my_service_name-1ca232ff-5a9c-407c-a3a0-9f198c6d1fa4-StreamThread-1-0_0-producer] [WARN ] [o.a.k.c.p.i.Sender] - [Producer clientId=my_service_name-1ca232ff-5a9c-407c-a3a0-9f198c6d1fa4-StreamThread-1-0_0-producer, transactionalId=my_service_name-0_0] Got error produce response with correlation id 28 on topic-partition my_service_name-state_store_1-repartition-1, retrying (2 attempts left). Error: UNKNOWN_PRODUCER_ID
As you can see, I get those errors from the INTERNAL topics that Kafka stream manage. Seems like some kind of retention period on the producer metadata in the internal topics / some kind of a producer id reset.
Couldn't find anything regarding this issue, only a description of the error itself from here:
ERROR CODE RETRIABLE DESCRIPTION
UNKNOWN_PRODUCER_ID 59 False This exception is raised by the broker if it could not locate the producer metadata associated with the producerId in question. This could happen if, for instance, the producer's records were deleted because their retention time had elapsed. Once the last records of the producer id are removed, the producer's metadata is removed from the broker, and future appends by the producer will return this exception.
Hope you can help,
Thanks
Edit:
It seems that the WARN message does not pop up on version 1.0.1 of kafka streams.

Resources