Number of connections to Kafka boot strap server - spring-boot

I have two topics with 10 partition on each and I am using below code to listen to messages. Here how many connections will establish ? is it 20?
Does each connection will be at partition level or namespace (bootstarap server) level ?
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Object, Object>> kafkaListenerContainerFactory() {
final ConcurrentKafkaListenerContainerFactory<Object, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
#KafkaListener(topics = "#{'${spring.kafka.topics}'.split(',')}",
concurrency = "20",
clientIdPrefix = "client123",
groupId = "group123")
public void listen(final ConsumerRecord<Object, Object> inputEvent) throws Exception {
handleMessage(inputEvent);
}

concurrency represents the number of threads; each thread creates a Consumer; they run in parallel, each consumer is responsible for partitions in topics, in your case you will start 20 consumer threads which will read from two topics, in the same consumer groups, because you have only 10 partitions per topic, then you will have 10 threads that will probably won't get partitions assigned and be idle.

Related

Kafka concurrency config Spring setConsumerTaskExecutor and setConcurrency

What is different between setConsumerTaskExecutor() and setConcurrency() in spring kafka?
What happend if the maxpool size of ConsumerTaskExecutor different with setConcurrency value?
ThreadPoolTaskExecutor customExecutor= new ThreadPoolTaskExecutor();
exec.setCorePoolSize(3);
exec.setMaxPoolSize(6);
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
factory.setConcurrency(10);
factory.getContainerProperties().setConsumerTaskExecutor(customExecutor);
Thank you
concurrency:
The maximum number of concurrent KafkaMessageListenerContainers running. Messages from within the same partition will be processed sequentially.
ConsumerTaskExecutor:
Set the executor for threads that poll the consumer.
https://docs.spring.io/spring-kafka/docs/current/api/org/springframework/kafka/listener/ConcurrentMessageListenerContainer.html#getConcurrency()
https://docs.spring.io/spring-kafka/docs/current/api/org/springframework/kafka/listener/ContainerProperties.html#setConsumerTaskExecutor(org.springframework.core.task.AsyncListenableTaskExecutor)
In other words, ConcurrentKafkaListenerContainer creates 1 or more KafkaMessageListenerContainers based on concurrency. And every KafkaMessageListenerContainer shares the ContainerProperties, using ConsumerTaskExecutor to run a ListenerConsumer which delegate to a Kafka consumer to do a poll.

Spring kafka consume records with some delay

I'm using spring kafka in my application. I want to add some delay of 15 mins for consuming the records for one of the listener - kafkaRetryListenerContainerFactory. I have two listeners. Below is my configuration:
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(primaryConsumerFactory());
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaRetryListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(retryConsumerFactory());
factory.setConcurrency(this.kafkaConfigProperties.getConsumerConcurrency());
factory.setAutoStartup(true);
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
return factory;
}
Kafka retry listener:
#KafkaListener(topics = "${spring.kafka.retry.topic}", groupId = "${spring.kafka.consumer-group-id}",
containerFactory = "kafkaRetryListenerContainerFactory", id = "retry.id")
public void retryMessage(ConsumerRecord<String, String> record, Acknowledgment acknowledgment) {
Thread.sleep(900000);
LOG.info(String.format("Consumed retry message -> %s", record.toString()));
acknowledgment.acknowledge();
}
When I added Thread.sleep(), I'm getting continuous rebalancing error in the logs
Attempt to heartbeat failed since group is rebalancing
My spring kafka version is 2.3.4
Below are the config values:
max.poll.interval.ms = 1200000 (this is higher than thread.sleep)
heartbeat.interval.ms = 3000
session.timeout.ms = 10000
I have tried ack.nack(900000); Still getting the rebalancing error
Any help will be appreciated
A filter is not the right approach; you need to Thread.sleep() the thread and make sure that max.poll.interval.ms is larger than the total sleep and processing time for the records received by the poll.
In 2.3, the container has the option to sleep between polls; with earlier versions, you have to do the sleep yourself.
EDIT
I just found this in my server.properties (homebrew on Mac OS):
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
That explains why we see the partitions initially assigned to the first consumer (see comment below).
Setting it back to the default 3000 works for me.

Rabbitmq concurrent consumers in Spring boot

I'm using #RabbitListener annotation and SimpleRabbitListenerContainerFactory bean for parallel execution of rabbitmq messages and setting the min and max concurrent consumers in the following way :
#Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory() {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory());
factory.setConcurrentConsumers(MIN_RABBIT_CONCURRENT_CONSUMERS);
factory.setMaxConcurrentConsumers(MAX_RABBIT_CONCURRENT_CONSUMERS);
factory.setConsecutiveActiveTrigger(1);
factory.setAcknowledgeMode(AcknowledgeMode.MANUAL);
return factory;
}
The min limit is 3 and the max limit is 10. With this configuration, only 3 messages are getting executed parallelly, even though there are 12 messages in the queue.
Please tell me what is wrong with the config?
You can create max concurrent consumers using rabbitMQ annotation
#RabbitListener(queues = "your-queue-name", concurrency = "4")
public void customCheck(Object requestObject) {
// process
}
With the default configuration, a new consumer will be added every 10 seconds if the other consumers are still busy.
The algorithm (and properties to affect it) is described here.

Spring kafka Batch Listener- commit offsets manually in Batch

I am implementing spring kafka batch listener, which reads list of messages from Kafka topic and posts the data to a REST service.
I would like to understand the offset management in case of the REST service goes down, the offsets for the batch should not be committed and the messages should be processed for the next poll. I have read spring kafka documentation but there is confusion in understanding the difference between Listener Error Handler and Seek to current container error handlers in batch. I am using spring-boot-2.0.0.M7 version and below is my code.
Listener Config:
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.parseInt(env.getProperty("spring.kafka.listener.concurrency")));
// factory.getContainerProperties().setPollTimeout(3000);
factory.getContainerProperties().setBatchErrorHandler(kafkaErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.BATCH);
factory.setBatchListener(true);
return factory;
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("spring.kafka.bootstrap-servers"));
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
env.getProperty("spring.kafka.consumer.enable-auto-commit"));
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,
env.getProperty("spring.kafka.consumer.auto-commit-interval"));
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, env.getProperty("spring.kafka.session.timeout"));
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, env.getProperty("spring.kafka.consumer.group-id"));
return propsMap;
}
Listener Class:
#KafkaListener(topics = "${spring.kafka.consumer.topic}", containerFactory = "kafkaListenerContainerFactory")
public void listen(List<String> payloadList) throws Exception {
if (payloadList.size() > 0)
//Post to the service
}
Kafka Error Handler:
public class KafkaErrorHandler implements BatchErrorHandler {
private static Logger LOGGER = LoggerFactory.getLogger(KafkaErrorHandler.class);
#Override
public void handle(Exception thrownException, ConsumerRecords<?, ?> data) {
LOGGER.info("Exception occured while processing::" + thrownException.getMessage());
}
}
How to handle Kafka listener so that if something happens during processing batch of records, I wouldn't loose data.
With Apache Kafka we never lose the data. There is indeed an offset in partition logs to seek to any arbitrary position.
On the other hand, when we consume records from a partition there is no requirement to commit their offsets - the current consumer holds the state in the memory. We need to commit only for other, new consumers in the same group when the current one is dead. Independently of the error, the current consumer always moves on to poll new data behind its current in-memory offset.
So, to reprocess the same data in the same consumer we definitely have to use seek operation to move the consumer back to the desired position. That's why Spring Kafka introduces SeekToCurrentErrorHandler:
This allows implementations to seek all unprocessed topic/partitions so the current record (and the others remaining) will be retrieved by the next poll. The SeekToCurrentErrorHandler does exactly this.
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_seek_to_current_container_error_handlers

Spring Kafka listenerExecutor

I'm setting up a kafka listener in a spring boot application and I can't seem to get the listener running in a pool using an executor. Here's my kafka configuration:
#Bean
ThreadPoolTaskExecutor messageProcessorExecutor() {
logger.info("Creating a message processor pool with {} threads", numThreads);
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(200);
exec.setMaxPoolSize(200);
exec.setKeepAliveSeconds(30);
exec.setAllowCoreThreadTimeOut(true);
exec.setQueueCapacity(0); // Yields a SynchronousQueue
exec.setThreadFactory(ThreadFactoryFactory.defaultNamingFactory("kafka", "processor"));
return exec;
}
#Bean
public ConsumerFactory<String, PollerJob> consumerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
DefaultKafkaConsumerFactory<String, PollerJob> factory = new DefaultKafkaConsumerFactory<>(props,
new StringDeserializer(),
new JsonDeserializer<>(PollerJob.class));
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, PollerJob> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, PollerJob> factory
= new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.valueOf(kafkaThreads));
factory.getContainerProperties().setListenerTaskExecutor(messageProcessorExecutor());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
The ThreadFactoryFactory used by the ThreadPoolTaskExecutor just makes sure the thread is named like 'kafka-1-processor-1'.
The ConsumerFactory has the ENABLE_AUTO_COMMIT_CONFIG flag set to false and I'm using manual mode for the acknowledgement which is required to use executors according to the documentation.
My listener looks like this:
#KafkaListener(topics = "my_topic",
group = "my_group",
containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload SomeJob job, Acknowledgment ack) {
ack.acknowledge();
logger.info("Running job {}", job.getId());
....
}
Using the Admin Server I can inspect all the threads and only one kafka-N-processor-N threads is being created but I expected to see up to 200. The jobs are all running one at a time on the that one thread and I can't figure out why.
How can I get this setup to run the listeners using my executor with as many threads as possible?
I'm using Spring Boot 1.5.4.RELEASE and kafka 0.11.0.0.
If your topic has only one partition, according the consumer group policy, only one consumer is able to poll that partition.
The ConcurrentMessageListenerContainer indeed creates as much target KafkaMessageListenerContainer instances as provided concurrency. And it does that only in case it doesn't know the number of partitions in the topic.
When the rebalance in consumer group happens only one consumer gets partition for consuming. All the work is really done there in a single thread:
private void startInvoker() {
ListenerConsumer.this.invoker = new ListenerInvoker();
ListenerConsumer.this.listenerInvokerFuture = this.containerProperties.getListenerTaskExecutor()
.submit(ListenerConsumer.this.invoker);
}
One partition - one thread for sequential records processing.

Resources