how to implement seekToEnd() in Spring Kafka belonging to AbstractConsumerSeekAware class? - spring-boot

I am having a requirement. We have an spring boot kafka consumer app which is reading from a kafka topic. Our requirement is whenever the app goes down and comes up I would like to begin to the latest offset, and not be bothered by old values. Is there a possibility to reset the offset of the group? I have researched a little bit and used AbstractConsumerSeekAware to set the offset to the end using seekToEnd() as seen in the below code.
public class KafkaConsumer extends AbstractConsumerSeekAware {
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
#KafkaListener(topics = "${topic.consumer}")
public void consume(ConsumerRecord<?, ?> consumerRecord, Acknowledgment ack) {
//consuming the message from consumer Record.
seekToEnd();
doSomething();
ack.acknowledge();
}
However when we stopped the app and restarted it, it started reading from last offset where it left but we wanted to read only from the offset when the app started. How can we achieve this?

That's not correct to do that in the #KafkaListener method. This one is really called only when consumer delivers records from the partition(s).
You must implement an onPartitionsAssigned() for that reason, so the consumer is going seek those partitions before starting polling them.
See more in docs: https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek

Related

What is the difference between event, channels, topics?

With refernce to Kafka, what is the difference between all these?
Lets say I have a component "Order" which must emit events into a kafka-channel when I create/cancel/modify orders.
And I create a channel. As "Order-out". Is topic the name I can use for this channel?
What is topic vs channel?
And this is the Order-Details component. Which creates & maintains records of all such orders.
I want to use an orderEvents class inside subscriber section of this component.
public class OrderEvents {
public static final String ORDER_CREATED = "ORDER_CREATED";
public static final String ORDER_MODIFIED = "ORDER_MODIFIED";
public static final String ORDER_CANCELLED = "ORDER_CANCELLED";
}
An event is a single record. In Spring, you might work with a Message class to wrap an event.
Channel is a Spring Integration term used via Spring-Kafka or Spring Cloud Stream Binders for inputs and outputs. A Binder determines the implementation of the Channel.
Topic is a Kafka unit of organization.
An event will be serialized into bytes, and sent via a channel to a Kafka topic.
A Kafka record will be consumed from a Kafka topic, through a channel, and deserialized into an application event class.

Restart listener and continue from latest message

Case
Clients are ReplyingKafkaTemplate instances.
Server is a ConcurrentMessageListenerContainer created using #KafkaListener and #SendTo annotations on a method.
ContainerFactory uses ContainerStoppingErrorHandler.
Request topic has only 1 partition.
Group ids are static. eg. test-consumer-group.
Requests are sent with timeouts.
Due to an exception thrown, server goes down
but the client keeps dispatching requests which queue up on the
request topic.
Current Behavior
When the server comes back up it continues processing old requests which would have timed out.
Desired Behavior
Instead, it would be better to continue with the last message; thereby skipping past even unprocessed messages as corresponding requests would timeout and retry.
Questions
What is the recommended approach to achieve this?
From the little that I understand, it looks like I'll have to manually set the initial offset. What's the simplest way to implement this?
Your #KafkaListener class must extends AbstractConsumerSeekAware and do something like this:
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
super.onPartitionsAssigned(assignments, callback);
callback.seekToEnd(assignments.keySet());
}
So, every time when your consumer joins the group it is going to seek all the assigned partitions to the end skipping all the old records.

Spring-Kafka Concurrency Property

I am progressing on writing my first Kafka Consumer by using Spring-Kafka. Had a look at the different options provided by framework, and have few doubts on the same. Can someone please clarify below if you have already worked on it.
Question - 1 : As per Spring-Kafka documentation, there are 2 ways to implement Kafka-Consumer; "You can receive messages by configuring a MessageListenerContainer and providing a message listener or by using the #KafkaListener annotation". Can someone tell when should I choose one option over another ?
Question - 2 : I have chosen KafkaListener approach for writing my application. For this I need to initialize a container factory instance and inside container factory there is option to control concurrency. Just want to double check if my understanding about concurrency is correct or not.
Suppose, I have a topic name MyTopic which has 4 partitions in it. And to consume messages from MyTopic, I've started 2 instances of my application and these instances are started by setting concurrency as 2. So, Ideally as per kafka assignment strategy, 2 partitions should go to consumer1 and 2 other partitions should go to consumer2. Since the concurrency is set as 2, does each of the consumer will start 2 threads, and will consume data from the topics in parallel ? Also should we consider anything if we are consuming in parallel.
Question 3 - I have chosen manual ack mode, and not managing the offsets externally (not persisting it to any database/filesystem). So should I need to write custom code to handle rebalance, or framework will manage it automatically ? I think no as I am acknowledging only after processing all the records.
Question - 4 : Also, with Manual ACK mode, which Listener will give more performance? BATCH Message Listener or normal Message Listener. I guess if I use Normal Message listener, the offsets will be committed after processing each of the messages.
Pasted the code below for your reference.
Batch Acknowledgement Consumer:
public void onMessage(List<ConsumerRecord<String, String>> records, Acknowledgment acknowledgment,
Consumer<?, ?> consumer) {
for (ConsumerRecord<String, String> record : records) {
System.out.println("Record : " + record.value());
// Process the message here..
listener.addOffset(record.topic(), record.partition(), record.offset());
}
acknowledgment.acknowledge();
}
Initialising container factory:
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<String, String>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> configs = new HashMap<String, Object>();
configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServer);
configs.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
configs.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, enablAutoCommit);
configs.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, maxPolInterval);
configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
configs.put(ConsumerConfig.CLIENT_ID_CONFIG, clientId);
configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return configs;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
// Not sure about the impact of this property, so going with 1
factory.setConcurrency(2);
factory.setBatchListener(true);
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
factory.getContainerProperties().setConsumerRebalanceListener(RebalanceListener.getInstance());
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setMessageListener(new BatchAckConsumer());
return factory;
}
#KafkaListener is a message-driven "POJO" it adds stuff like payload conversion, argument matching, etc. If you implement MessageListener you can only get the raw ConsumerRecord from Kafka. See #KafkaListener Annotation.
Yes, the concurrency represents the number of threads; each thread creates a Consumer; they run in parallel; in your example, each would get 2 partitions.
Also should we consider anything if we are consuming in parallel.
Your listener must be thread-safe (no shared state or any such state needs to be protected by locks.
It's not clear what you mean by "handle rebalance events". When a rebalance occurs, the framework will commit any pending offsets.
It doesn't make a difference; message listener Vs. batch listener is just a preference. Even with a message listener, with MANUAL ackmode, the offsets are committed when all the results from the poll have been processed. With MANUAL_IMMEDIATE mode, the offsets are committed one-by-one.
Q1:
From the documentation,
The #KafkaListener annotation is used to designate a bean method as a
listener for a listener container. The bean is wrapped in a
MessagingMessageListenerAdapter configured with various features, such
as converters to convert the data, if necessary, to match the method
parameters.
You can configure most attributes on the annotation with SpEL by using
"#{…​} or property placeholders (${…​}). See the Javadoc for more information."
This approach can be useful for simple POJO listeners and you do not need to implement any interfaces. You are also enabled to listen on any topics and partitions in a declarative way using the annotations. You can also potentially return the value you received whereas in case of MessageListener, you are bound by the signature of the interface.
Q2:
Ideally yes. If you have multiple topics to consume from, it gets more complicated though. Kafka by default uses RangeAssignor which has its own behaviour (you can change this -- see more details under).
Q3:
If your consumer dies, there will be rebalancing. If you acknowledge manually and your consumer dies before committing offsets, you do not need to do anything, Kafka handles that. But you could end up with some duplicate messages (at-least once)
Q4:
It depends what you mean by "performance". If you meant latency, then consuming each record as fast as possible will be the way to go. If you want to achieve high throughput, then batch consumption is more efficient.
I had written some samples using Spring kafka and various listeners - check out this repo

Kafka Producer Thread, huge amound of threads even when no message is send

I profiled my kafka producer spring boot application and found many "kafka-producer-network-thread"s running (47 in total). Which would never stop running, even when no data is sending.
My application looks a bit like this:
var kafkaSender = KafkaSender(kafkaTemplate, applicationProperties)
kafkaSender.sendToKafka(json, rs.getString("KEY"))
with the KafkaSender:
#Service
class KafkaSender(val kafkaTemplate: KafkaTemplate<String, String>, val applicationProperties: ApplicationProperties) {
#Transactional(transactionManager = "kafkaTransactionManager")
fun sendToKafka(message: String, stringKey: String) {
kafkaTemplate.executeInTransaction { kt ->
kt.send(applicationProperties.kafka.topic, System.currentTimeMillis().mod(10).toInt(), System.currentTimeMillis().rem(10).toString(),
message)
}
}
companion object {
val log = LoggerFactory.getLogger(KafkaSender::class.java)!!
}
}
Since each time I want to send a message to Kafka I instantiate a new KafkaSender, I thought a new thread would be created which then sends the message to the kafka queue.
Currently it looks like a pool of producers is generated, but never cleaned up, even when none of them has anything to do.
Is this behaviour intended?
In my opinion the behaviour should be nearly the same as datasource pooling, keep the thread alive for some time, but when there is nothing to do, clear it up.
When using transactions, the producer cache grows on demand and is not reduced.
If you are producing messages on a listener container (consumer) thread; there is a producer for each topic/partition/consumer group. This is required to solve the zombie fencing problem, so that if a rebalance occurs and the partition moves to a different instance, the transaction id will remain the same so the broker can properly handle the situation.
If you don't care about the zombie fencing problem (and you can handle duplicate deliveries), set the producerPerConsumerPartition property to false on the DefaultKafkaProducerFactory and the number of producers will be much smaller.
EDIT
Starting with version 2.8 the default EOSMode is now V2 (aka BETA); which means it is no longer necessary to have a producer per topic/partition/group - as long as the broker version is 2.5 or later.

Spring Kafka: Poll for new messages instead of being notified using `onMessage`

I am using Spring Kafka in my project as it seemed a natural choice in a Spring based project to consume Kafka messages. To consume messages, I can make use of the MessageListener interface. Spring Kafka internally takes care to invoke my onMessage method for each new message.
However, in my setting I prefer to explicitly poll for new messages and work on them sequentially (which will take a few seconds). As a workaround, I might just block inside my onMessage implementation, or buffer the messages internally. However, this seems to go against the core idea of Spring Kafka.
Kafka is designed so that consumers have to poll for new messages, which matches my requirements. Is there a way to make use of this "natural" workflow with Spring Kafka?
Should I refrain from using Spring Kafka for this use case?
The KafkaConsumer documentation states:
For use cases where message processing time varies unpredictably,
neither of these options may be sufficient. The recommended way to
handle these cases is to move message processing to another thread,
which allows the consumer to continue calling poll while the processor
is still working. Some care must be taken to ensure that committed
offsets do not get ahead of the actual position. Typically, you must
disable automatic commits and manually commit processed offsets for
records only after the thread has finished handling them (depending on
the delivery semantics you need). Note also that you will need to
pause the partition so that no new records are received from poll
until after thread has finished handling those previously returned.
Related issue: https://github.com/spring-projects/spring-kafka/issues/195
The issue with having to keep polling the consumer has now been resolved (in 0.10.1.x by KIP-62) so that's not an issue any more (as long as you don't exceed the max.poll.interval.ms) which is 5 mins by default but can be increased.
However, if you want to poll yourself, you can still use spring-kafka (e.g. to get the Spring Boot auto configuration goodness if you are using Boot), but you can get a Consumer from the DefaultKafkaConsumerFactory and poll() it directly.
Here is how I do it. This is in the context of an integration test configuration class, which I load in my JUnit with:
#Import(IntegrationTestConfiguration.class)
In my Test class I have the following:
#Autowired
Consumer<String, String> consumer;
In my test configuration class, I have:
#Bean
public Consumer<String, String> consumer() {
String bootstrapAddress = "server:port"; // fix this
String groupId = "my.group"; // fix this.
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
ConsumerFactory<String, String> fact = new DefaultKafkaConsumerFactory<>(props);
// Create the consumer, subscribe to the topic
Consumer<String, String> consumer = fact.createConsumer();
String topic = "my.topic"; // fix this.
List<String> topics = new ArrayList<String>();
topics.add(topic);
consumer.subscribe(topics);
return consumer;
}
Finally, in my test, I do:
#Test
public void testSomething() {
// Do stuff that will publish a message to Kafka
// Repeat a number of times untill you get the message you want...
// Or you give up
Duration d = Duration.ofSeconds(2);
ConsumerRecords<String, String> records = consumer.poll(d);
}

Resources