Spring kafka Batch Listener- commit offsets manually in Batch - spring

I am implementing spring kafka batch listener, which reads list of messages from Kafka topic and posts the data to a REST service.
I would like to understand the offset management in case of the REST service goes down, the offsets for the batch should not be committed and the messages should be processed for the next poll. I have read spring kafka documentation but there is confusion in understanding the difference between Listener Error Handler and Seek to current container error handlers in batch. I am using spring-boot-2.0.0.M7 version and below is my code.
Listener Config:
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.parseInt(env.getProperty("spring.kafka.listener.concurrency")));
// factory.getContainerProperties().setPollTimeout(3000);
factory.getContainerProperties().setBatchErrorHandler(kafkaErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.BATCH);
factory.setBatchListener(true);
return factory;
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("spring.kafka.bootstrap-servers"));
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
env.getProperty("spring.kafka.consumer.enable-auto-commit"));
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,
env.getProperty("spring.kafka.consumer.auto-commit-interval"));
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, env.getProperty("spring.kafka.session.timeout"));
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, env.getProperty("spring.kafka.consumer.group-id"));
return propsMap;
}
Listener Class:
#KafkaListener(topics = "${spring.kafka.consumer.topic}", containerFactory = "kafkaListenerContainerFactory")
public void listen(List<String> payloadList) throws Exception {
if (payloadList.size() > 0)
//Post to the service
}
Kafka Error Handler:
public class KafkaErrorHandler implements BatchErrorHandler {
private static Logger LOGGER = LoggerFactory.getLogger(KafkaErrorHandler.class);
#Override
public void handle(Exception thrownException, ConsumerRecords<?, ?> data) {
LOGGER.info("Exception occured while processing::" + thrownException.getMessage());
}
}
How to handle Kafka listener so that if something happens during processing batch of records, I wouldn't loose data.

With Apache Kafka we never lose the data. There is indeed an offset in partition logs to seek to any arbitrary position.
On the other hand, when we consume records from a partition there is no requirement to commit their offsets - the current consumer holds the state in the memory. We need to commit only for other, new consumers in the same group when the current one is dead. Independently of the error, the current consumer always moves on to poll new data behind its current in-memory offset.
So, to reprocess the same data in the same consumer we definitely have to use seek operation to move the consumer back to the desired position. That's why Spring Kafka introduces SeekToCurrentErrorHandler:
This allows implementations to seek all unprocessed topic/partitions so the current record (and the others remaining) will be retrieved by the next poll. The SeekToCurrentErrorHandler does exactly this.
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_seek_to_current_container_error_handlers

Related

Question on Spring Kafka Listener Consumer Offset Acknowledgement

I have created the below consumer factory.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setAutoStartup(autoStart);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
return factory;
}
The Kafka listener is given below.
#KafkaListener(id= "${topic1}" ,
topics = "${topic1}",
groupId = "${consumer.group1}", concurrency = "1", containerFactory = "kafkaListenerContainerFactory")
public void consumeEvents1(String jsonObject, #Headers Map<String, String> header, Acknowledgment acknowledgment) {
LOG.info("Message - {}", jsonObject);
LOG.info(header.get(KafkaHeaders.GROUP_ID) + header.get(KafkaHeaders.RECEIVED_TOPIC)+String.valueOf(header.get(KafkaHeaders.OFFSET)));
acknowledgment.acknowledge();
}
In the consumer factory, I did not set factory.setBatchListener(true); My understanding is that the above listener code is called for each message as it is not a batch listener. That is what the behavior I saw. In the batch listener, I get a list of messages instead of the message by message.
As the listener is not batch-based, the acknowledgment.acknowledge() is going to have the same behavior for MANUAL, Or MANUAL_IMMEDIATE. Is that the correct understanding?
I referred to the below material.
With MANUAL, the commit is queued until the whole batch is processed; this is more efficient, but increases the possibility of getting redeliveries.
With MANUAL_IMMEDIATE, the commit occurs right away, as long as you call it on the listener thread.

spring-kafka consumer batch error handling with spring boot version 2.3.7

I am trying to perform the spring kafka batch process error handling. First of all i have few questions.
what is difference between listener and container error handlers and what errors comes into these two categories ?
Could you please help some samples on this to understand better ?
Here is our design:
Poll every certain interval
consume messages in a batch mode
push to local cache (application cache) based on key (to avoid duplicate events)
push all values one by one to another topic once batch process done.
clear the the cache once the operation 3 done and acknowledge the offsets manually.
Here is my plan to have error handling:
public ConcurrentKafkaListenerContainerFactory<String, String> myListenerPartitionContainerFactory(String groupId) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory(groupId));
factory.setConcurrency(partionCount);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
factory.getContainerProperties().setIdleBetweenPolls(pollInterval);
factory.setBatchListener(true);
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> myPartitionsListenerContainerFactory()
{
return myListenerPartitionContainerFactory(groupIdPO);
}
#Bean
public RecoveringBatchErrorHandler(KafkaTemplate<String, String> errorKafkaTemplate) {
DeadLetterPublishingRecoverer recoverer =
new DeadLetterPublishingRecoverer(errorKakfaTemplate);
RecoveringBatchErrorHandler errorHandler =
new RecoveringBatchErrorHandler(recoverer, new FixedBackOff(2L, 5000)); // push error event to the error topic
}
#KafkaListener(id = "mylistener", topics = "someTopic", containerFactory = "myPartitionsListenerContainerFactory"))
public void listen(List<ConsumerRecord<String, String>> records, #Header(KafkaHeaders.MESSAGE_KEY) String key, Acknowledgement ack) {
Map hashmap = new Hashmap<>();
records.forEach(record -> {
try {
//key will be formed based on the input record - it will be id.
hashmap.put(key, record);
}
catch (Exception e) {
throw new BatchListenerFailedException("Failed to process", record);
}
});
// Once success each messages to another topic.
try {
hashmap.forEach( (key,value) -> { push to another topic })
hashmap.clear();
ack.acknowledge();
} catch(Exception ex) {
//handle producer exceptions
}
}
is the direction good or any improvements needs to be done? And also what type of container and listener handlers need to be implemented?
#Gary Russell.. could you please help on this ?
The listener error handler is intended for request/reply situations where the error handler can return a meaningful reply to the sender.
You need to throw an exception to trigger the container error handler and you need to know in the index in the original batch to tell it which record failed.
If you are using manual acks like that, you can use the nack() method to indicate which record failed (and don't throw an exception in that case).

Spring Batch Parallel processing with JMS

I implemented a spring batch project that reads from a weblogic Jms queue (Custom Item Reader not message driven), then pass the Jms message data to an item writer (chunk = 1) where i call some APIs and write in DataBase.
However, i am trying to implement parallel Jms processing, reading in parallel Jms messages and passing them to the writer without waiting for the previous processes to complete.
I’ve used a DefaultMessageListenerContainer in a previous project and it offers a parallel consuming of jms messages, but in this project i have to use the spring batch framework.
I tried using the easiest solution (multi-threaded step) but it
didn’t work , JmsException : "invalid blocking receive when another
receive is in progress" which means probably that my reader is
statefull.
I thought about using remote partitioning but then i have to read all
messages and put the data into step execution contexts before calling
the slave steps, which isn't really efficient if dealing with a large
number of messages.
I looked a little bit into remote chunking, i understand that it passes data via queue channels, but i can't seem to find the utility in reading from a Jms and putting messages in a local queue for slave workers.
How can I approach this?
My code:
#Bean
Step step1() {
return steps.get("step1").<Message, DetectionIncoherenceLiqJmsOut>chunk(1)
.reader(reader()).processor(processor()).writer(writer())
.listener(stepListener()).build();
}
#Bean
Job job(#Qualifier("step1") Step step1) {
return jobs.get("job").start(step1).build();
}
Jms Code :
#Override
public void initQueueConnection() throws NamingException, JMSException {
Hashtable<String, String> properties = new Hashtable<String, String>();
properties.put(Context.INITIAL_CONTEXT_FACTORY, env.getProperty(WebLogicConstant.JNDI_FACTORY));
properties.put(Context.PROVIDER_URL, env.getProperty(WebLogicConstant.JMS_WEBLOGIC_URL_RECEIVE));
InitialContext vInitialContext = new InitialContext(properties);
QueueConnectionFactory vQueueConnectionFactory = (QueueConnectionFactory) vInitialContext
.lookup(env.getProperty(WebLogicConstant.JMS_FACTORY_RECEIVE));
vQueueConnection = vQueueConnectionFactory.createQueueConnection();
vQueueConnection.start();
vQueueSession = vQueueConnection.createQueueSession(false, 0);
Queue vQueue = (Queue) vInitialContext.lookup(env.getProperty(WebLogicConstant.JMS_QUEUE_RECEIVE));
consumer = vQueueSession.createConsumer(vQueue, "JMSCorrelationID IS NOT NULL");
}
#Override
public Message receiveMessages() throws NamingException, JMSException {
return consumer.receive(20000);
}
Item reader :
#Override
public Message read() throws Exception {
return jmsServiceReceiver.receiveMessages();
}
Thanks ! i'll appreciate the help :)
There's a BatchMessageListenerContainer in the spring-batch-infrastructure-tests sub project.
https://github.com/spring-projects/spring-batch/blob/d8fc58338d3b059b67b5f777adc132d2564d7402/spring-batch-infrastructure-tests/src/main/java/org/springframework/batch/container/jms/BatchMessageListenerContainer.java
Message listener container adapted for intercepting the message reception with advice provided through configuration.
To enable batching of messages in a single transaction, use the TransactionInterceptor and the RepeatOperationsInterceptor in the advice chain (with or without a transaction manager set in the base class). Instead of receiving a single message and processing it, the container will then use a RepeatOperations to receive multiple messages in the same thread. Use with a RepeatOperations and a transaction interceptor. If the transaction interceptor uses XA then use an XA connection factory, or else the TransactionAwareConnectionFactoryProxy to synchronize the JMS session with the ongoing transaction (opening up the possibility of duplicate messages after a failure). In the latter case you will not need to provide a transaction manager in the base class - it only gets on the way and prevents the JMS session from synchronizing with the database transaction.
Perhaps you could adapt it for your use case.
I was able to do so with a multithreaded step :
// Jobs et Steps
#Bean
Step stepDetectionIncoherencesLiq(#Autowired StepBuilderFactory steps) {
int threadSize = Integer.parseInt(env.getProperty(PropertyConstant.THREAD_POOL_SIZE));
return steps.get("stepDetectionIncoherencesLiq").<Message, DetectionIncoherenceLiqJmsOut>chunk(1)
.reader(reader()).processor(processor()).writer(writer())
.readerIsTransactionalQueue()
.faultTolerant()
.taskExecutor(taskExecutor())
.throttleLimit(threadSize)
.listener(stepListener())
.build();
}
And a jmsItemReader with jmsTemplate instead of creating session and connections explicitly, it manages connections so i dont have the jms exception anymore:( JmsException : "invalid blocking receive when another receive is in progress" )
#Bean
public JmsItemReader<Message> reader() {
JmsItemReader<Message> itemReader = new JmsItemReader<>();
itemReader.setItemType(Message.class);
itemReader.setJmsTemplate(jmsTemplate());
return itemReader;
}

Error handling - Consumer - apache kafka and spring

I am learning to use kafka, I have two services a producer and a consumer.
The producer produces messages that require processing (queries to services and database). These messages are received by the consumer, it is responsible for processing them and saves the result in a database
Producer
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
...
kafkaTemplate.send(topic, message);
Consumer
#KafkaListener(topics = "....")
public void listen(#Payload String message) {
....
}
I would like all messages to be processed correctly by the consumer.
I do not know how to handle errors on the consumer side in this context. For example, a database might be temporarily disabled and could not handle certain messages.
What to do in these cases?
I know that the responsibility belongs to the consumer.
I could do retries, but retry several times in a row if a database is down does not seem like a good idea. And if I continue to consume messages, the index advances and I lose the events that I could not process by mistake.
You have control over kafka consumer in form of committing the offset of records read. Kafka will continue to return the same records unless the offset is committed. You can set offset commit to manual and based on the success of your business logic decide whether to commit or not. See a sample below
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}
Consumer.commitsync() commits the offset.
Also see the kakfa consumer documentation to understand the consumer offsets here .
This link was very helpful https://dzone.com/articles/spring-for-apache-kafka-deep-dive-part-1-error-han
Spring provides the DeadLetterPublishingRecoverer class that performs a correct handling of errors.

Spring Kafka listenerExecutor

I'm setting up a kafka listener in a spring boot application and I can't seem to get the listener running in a pool using an executor. Here's my kafka configuration:
#Bean
ThreadPoolTaskExecutor messageProcessorExecutor() {
logger.info("Creating a message processor pool with {} threads", numThreads);
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(200);
exec.setMaxPoolSize(200);
exec.setKeepAliveSeconds(30);
exec.setAllowCoreThreadTimeOut(true);
exec.setQueueCapacity(0); // Yields a SynchronousQueue
exec.setThreadFactory(ThreadFactoryFactory.defaultNamingFactory("kafka", "processor"));
return exec;
}
#Bean
public ConsumerFactory<String, PollerJob> consumerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
DefaultKafkaConsumerFactory<String, PollerJob> factory = new DefaultKafkaConsumerFactory<>(props,
new StringDeserializer(),
new JsonDeserializer<>(PollerJob.class));
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, PollerJob> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, PollerJob> factory
= new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.valueOf(kafkaThreads));
factory.getContainerProperties().setListenerTaskExecutor(messageProcessorExecutor());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
The ThreadFactoryFactory used by the ThreadPoolTaskExecutor just makes sure the thread is named like 'kafka-1-processor-1'.
The ConsumerFactory has the ENABLE_AUTO_COMMIT_CONFIG flag set to false and I'm using manual mode for the acknowledgement which is required to use executors according to the documentation.
My listener looks like this:
#KafkaListener(topics = "my_topic",
group = "my_group",
containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload SomeJob job, Acknowledgment ack) {
ack.acknowledge();
logger.info("Running job {}", job.getId());
....
}
Using the Admin Server I can inspect all the threads and only one kafka-N-processor-N threads is being created but I expected to see up to 200. The jobs are all running one at a time on the that one thread and I can't figure out why.
How can I get this setup to run the listeners using my executor with as many threads as possible?
I'm using Spring Boot 1.5.4.RELEASE and kafka 0.11.0.0.
If your topic has only one partition, according the consumer group policy, only one consumer is able to poll that partition.
The ConcurrentMessageListenerContainer indeed creates as much target KafkaMessageListenerContainer instances as provided concurrency. And it does that only in case it doesn't know the number of partitions in the topic.
When the rebalance in consumer group happens only one consumer gets partition for consuming. All the work is really done there in a single thread:
private void startInvoker() {
ListenerConsumer.this.invoker = new ListenerInvoker();
ListenerConsumer.this.listenerInvokerFuture = this.containerProperties.getListenerTaskExecutor()
.submit(ListenerConsumer.this.invoker);
}
One partition - one thread for sequential records processing.

Resources