Spring Kafka listenerExecutor - spring

I'm setting up a kafka listener in a spring boot application and I can't seem to get the listener running in a pool using an executor. Here's my kafka configuration:
#Bean
ThreadPoolTaskExecutor messageProcessorExecutor() {
logger.info("Creating a message processor pool with {} threads", numThreads);
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(200);
exec.setMaxPoolSize(200);
exec.setKeepAliveSeconds(30);
exec.setAllowCoreThreadTimeOut(true);
exec.setQueueCapacity(0); // Yields a SynchronousQueue
exec.setThreadFactory(ThreadFactoryFactory.defaultNamingFactory("kafka", "processor"));
return exec;
}
#Bean
public ConsumerFactory<String, PollerJob> consumerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
DefaultKafkaConsumerFactory<String, PollerJob> factory = new DefaultKafkaConsumerFactory<>(props,
new StringDeserializer(),
new JsonDeserializer<>(PollerJob.class));
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, PollerJob> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, PollerJob> factory
= new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.valueOf(kafkaThreads));
factory.getContainerProperties().setListenerTaskExecutor(messageProcessorExecutor());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
The ThreadFactoryFactory used by the ThreadPoolTaskExecutor just makes sure the thread is named like 'kafka-1-processor-1'.
The ConsumerFactory has the ENABLE_AUTO_COMMIT_CONFIG flag set to false and I'm using manual mode for the acknowledgement which is required to use executors according to the documentation.
My listener looks like this:
#KafkaListener(topics = "my_topic",
group = "my_group",
containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload SomeJob job, Acknowledgment ack) {
ack.acknowledge();
logger.info("Running job {}", job.getId());
....
}
Using the Admin Server I can inspect all the threads and only one kafka-N-processor-N threads is being created but I expected to see up to 200. The jobs are all running one at a time on the that one thread and I can't figure out why.
How can I get this setup to run the listeners using my executor with as many threads as possible?
I'm using Spring Boot 1.5.4.RELEASE and kafka 0.11.0.0.

If your topic has only one partition, according the consumer group policy, only one consumer is able to poll that partition.
The ConcurrentMessageListenerContainer indeed creates as much target KafkaMessageListenerContainer instances as provided concurrency. And it does that only in case it doesn't know the number of partitions in the topic.
When the rebalance in consumer group happens only one consumer gets partition for consuming. All the work is really done there in a single thread:
private void startInvoker() {
ListenerConsumer.this.invoker = new ListenerInvoker();
ListenerConsumer.this.listenerInvokerFuture = this.containerProperties.getListenerTaskExecutor()
.submit(ListenerConsumer.this.invoker);
}
One partition - one thread for sequential records processing.

Related

How to set custom task executor on Kafka Listener

I am running on spring-kafka:2.6.7 and I am looking for a way to set a custom task executor for my listener. Below is my Kafka configuration.
#Bean
ProducerFactory<Integer, BaseEventTemplate> eventProducerFactory() {
Map<String, Object> producerProps = new HashMap<>()
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer)
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class)
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, BaseEventTemplateSerializer.class)
producerProps.put(ProducerConfig.ACKS_CONFIG, "all")
producerProps.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 256)
return new DefaultKafkaProducerFactory<>(producerProps)
}
#Bean
KafkaTemplate<Integer, BaseEventTemplate> baseEventKafkaTemplate() {
return new KafkaTemplate<>(eventProducerFactory())
}
#Bean
ConsumerFactory<Integer, BaseEventTemplate> baseEventConsumerFactory() {
Map<String, Object> consumerProps = new HashMap<>()
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer)
consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, "kafkaeventconsumer")
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, IntegerDeserializer.class)
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, BaseEventTemplateDeserializer.class)
consumerProps.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false)
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
consumerProps.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, Collections.singletonList(RoundRobinAssignor.class))
return new DefaultKafkaConsumerFactory<>(consumerProps)
}
#Bean
ConcurrentKafkaListenerContainerFactory<Integer, BaseEventTemplate> baseEventKafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, BaseEventTemplate> factory =
new ConcurrentKafkaListenerContainerFactory<>()
factory.setConsumerFactory(baseEventConsumerFactory())
factory.setConcurrency(3)
factory.getContainerProperties().setPollTimeout(3000)
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE)
factory.getContainerProperties().setSyncCommits(true)
return factory
}
I have a way to set consumer task executor via factory.getContainerProperties().setConsumerTaskExecutor() but not sure how to set task executor for listener.
2.6.x is out of OSS support https://spring.io/projects/spring-kafka#support
The same thread used to poll the consumer is used to call the listener.
In very early versions (before 1.3), there were two threads due to limitations in the kafka-clients, but there is only one now (for the last 5 years).

Question on Spring Kafka Listener Consumer Offset Acknowledgement

I have created the below consumer factory.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setAutoStartup(autoStart);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
return factory;
}
The Kafka listener is given below.
#KafkaListener(id= "${topic1}" ,
topics = "${topic1}",
groupId = "${consumer.group1}", concurrency = "1", containerFactory = "kafkaListenerContainerFactory")
public void consumeEvents1(String jsonObject, #Headers Map<String, String> header, Acknowledgment acknowledgment) {
LOG.info("Message - {}", jsonObject);
LOG.info(header.get(KafkaHeaders.GROUP_ID) + header.get(KafkaHeaders.RECEIVED_TOPIC)+String.valueOf(header.get(KafkaHeaders.OFFSET)));
acknowledgment.acknowledge();
}
In the consumer factory, I did not set factory.setBatchListener(true); My understanding is that the above listener code is called for each message as it is not a batch listener. That is what the behavior I saw. In the batch listener, I get a list of messages instead of the message by message.
As the listener is not batch-based, the acknowledgment.acknowledge() is going to have the same behavior for MANUAL, Or MANUAL_IMMEDIATE. Is that the correct understanding?
I referred to the below material.
With MANUAL, the commit is queued until the whole batch is processed; this is more efficient, but increases the possibility of getting redeliveries.
With MANUAL_IMMEDIATE, the commit occurs right away, as long as you call it on the listener thread.

Spring Data Redis - StreamMessageListenerContainer only spawning one thread

I am using spring data redis to subscribe to the 'task' redis stream to process tasks.
For some reason redis stream consumer only spawns one thread and processes one message at a time sequentially even thought I explicitly provide a Threadpool TaskExecutor.
I expect it to delegate the creation of threads to the provided Threadpool and spawn a thread up to the Threadpool configured limits. I can see that it is using the give TaskExecutor, but it's not spawning more than one thread.
Even when I don't specify my own taskExecutor, and it internally defaults to SimpleAsyncTaskExecutor, the problem still continues. Tasks are processed sequentially one at a time, one after the other, even when they are long lasting task.
What am I missing here?
#Bean
public Subscription
redisTaskStreamListenerContainer(
RedisConnectionFactory connectionFactory,
#Qualifier("task") RedisTemplate<String, Task<TransportEnvelope>> redisTemplate,
#Qualifier("task") StreamListener<String, MapRecord<String, String, String>> listener,
#Qualifier("task") Executor taskListenerExecutor) {
StreamMessageListenerContainerOptions<String, MapRecord<String, String, String>>
containerOptions = StreamMessageListenerContainerOptions.builder()
.pollTimeout(Duration.ofMillis(consumerPollTimeOutInMilli))
.batchSize(consumerReadBatchSize)
.executor(taskListenerExecutor)
.build();
StreamMessageListenerContainer<String, MapRecord<String, String, String>> container =
StreamMessageListenerContainer.create(connectionFactory, containerOptions);
StreamMessageListenerContainer.ConsumerStreamReadRequest<String> readOptions
=
StreamMessageListenerContainer.StreamReadRequest
.builder(StreamOffset.create(streamName, ReadOffset.lastConsumed()))
//turn off auto shutdown of stream consumer if an error occurs.
.cancelOnError((ex) -> false)
.consumer(Consumer.from(groupId, consumerId))
.build();
Subscription subscription = container.register(readOptions, listener);
container.start();
return subscription;
}
#Bean
#Qualifier("task")
public Executor redisListenerThreadPoolTaskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(30);
threadPoolTaskExecutor.setMaxPoolSize(50);
threadPoolTaskExecutor.setQueueCapacity(Integer.MAX_VALUE);
threadPoolTaskExecutor.setThreadNamePrefix("redis-listener-");
threadPoolTaskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
return threadPoolTaskExecutor;
}

Kafka: Multiple instances in the same consumer group listening to the same partition inside for topic

I have two instances of kafka consumer, configured with the same consumer group and listening to partition 0 in the same topic. The problem is when I send a message to the topic. The message is consumed by both instances which supposed not to happen as they are in the same group.
I am using Spring Boot configuration class to configure them.
Here is the configuration:
#Bean
ConcurrentKafkaListenerContainerFactory<Integer, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL_IMMEDIATE);
return factory;
}
#Bean
public ConsumerFactory<Integer, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroupId);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "100");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "15000");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDeserializer);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDeserializer);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return props;
}
Here is the listener:
#KafkaListener(topicPartitions = {#TopicPartition(topic = "${kafka.topic.orders}", partitions = "0")})
public void consume(ConsumerRecord<String, String> record, Acknowledgment acknowledgment) {
log.info("message received at " + orderTopic + "at partition 0");
processRecord(record, acknowledgment);
}
Kafka doesn't work like that; when you manually assign partitions like that (#TopicPartition) you are explicitly telling Kafka you want to receive messages from that partition - the consumer assign() s the partitions to itself.
In other words, with manual assignment, you are taking responsibility for distributing the partitions.
You need use group management, and let Kafka assign the topics to the instances.
use topics = "..." and Kafka will do the assignment. If you don't have enough topics, instances will be idle. You need at least as many partitions as instances to have all instances participate.

Spring kafka Batch Listener- commit offsets manually in Batch

I am implementing spring kafka batch listener, which reads list of messages from Kafka topic and posts the data to a REST service.
I would like to understand the offset management in case of the REST service goes down, the offsets for the batch should not be committed and the messages should be processed for the next poll. I have read spring kafka documentation but there is confusion in understanding the difference between Listener Error Handler and Seek to current container error handlers in batch. I am using spring-boot-2.0.0.M7 version and below is my code.
Listener Config:
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.parseInt(env.getProperty("spring.kafka.listener.concurrency")));
// factory.getContainerProperties().setPollTimeout(3000);
factory.getContainerProperties().setBatchErrorHandler(kafkaErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.BATCH);
factory.setBatchListener(true);
return factory;
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("spring.kafka.bootstrap-servers"));
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
env.getProperty("spring.kafka.consumer.enable-auto-commit"));
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,
env.getProperty("spring.kafka.consumer.auto-commit-interval"));
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, env.getProperty("spring.kafka.session.timeout"));
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, env.getProperty("spring.kafka.consumer.group-id"));
return propsMap;
}
Listener Class:
#KafkaListener(topics = "${spring.kafka.consumer.topic}", containerFactory = "kafkaListenerContainerFactory")
public void listen(List<String> payloadList) throws Exception {
if (payloadList.size() > 0)
//Post to the service
}
Kafka Error Handler:
public class KafkaErrorHandler implements BatchErrorHandler {
private static Logger LOGGER = LoggerFactory.getLogger(KafkaErrorHandler.class);
#Override
public void handle(Exception thrownException, ConsumerRecords<?, ?> data) {
LOGGER.info("Exception occured while processing::" + thrownException.getMessage());
}
}
How to handle Kafka listener so that if something happens during processing batch of records, I wouldn't loose data.
With Apache Kafka we never lose the data. There is indeed an offset in partition logs to seek to any arbitrary position.
On the other hand, when we consume records from a partition there is no requirement to commit their offsets - the current consumer holds the state in the memory. We need to commit only for other, new consumers in the same group when the current one is dead. Independently of the error, the current consumer always moves on to poll new data behind its current in-memory offset.
So, to reprocess the same data in the same consumer we definitely have to use seek operation to move the consumer back to the desired position. That's why Spring Kafka introduces SeekToCurrentErrorHandler:
This allows implementations to seek all unprocessed topic/partitions so the current record (and the others remaining) will be retrieved by the next poll. The SeekToCurrentErrorHandler does exactly this.
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_seek_to_current_container_error_handlers

Resources