Spring Data Redis - StreamMessageListenerContainer only spawning one thread - spring

I am using spring data redis to subscribe to the 'task' redis stream to process tasks.
For some reason redis stream consumer only spawns one thread and processes one message at a time sequentially even thought I explicitly provide a Threadpool TaskExecutor.
I expect it to delegate the creation of threads to the provided Threadpool and spawn a thread up to the Threadpool configured limits. I can see that it is using the give TaskExecutor, but it's not spawning more than one thread.
Even when I don't specify my own taskExecutor, and it internally defaults to SimpleAsyncTaskExecutor, the problem still continues. Tasks are processed sequentially one at a time, one after the other, even when they are long lasting task.
What am I missing here?
#Bean
public Subscription
redisTaskStreamListenerContainer(
RedisConnectionFactory connectionFactory,
#Qualifier("task") RedisTemplate<String, Task<TransportEnvelope>> redisTemplate,
#Qualifier("task") StreamListener<String, MapRecord<String, String, String>> listener,
#Qualifier("task") Executor taskListenerExecutor) {
StreamMessageListenerContainerOptions<String, MapRecord<String, String, String>>
containerOptions = StreamMessageListenerContainerOptions.builder()
.pollTimeout(Duration.ofMillis(consumerPollTimeOutInMilli))
.batchSize(consumerReadBatchSize)
.executor(taskListenerExecutor)
.build();
StreamMessageListenerContainer<String, MapRecord<String, String, String>> container =
StreamMessageListenerContainer.create(connectionFactory, containerOptions);
StreamMessageListenerContainer.ConsumerStreamReadRequest<String> readOptions
=
StreamMessageListenerContainer.StreamReadRequest
.builder(StreamOffset.create(streamName, ReadOffset.lastConsumed()))
//turn off auto shutdown of stream consumer if an error occurs.
.cancelOnError((ex) -> false)
.consumer(Consumer.from(groupId, consumerId))
.build();
Subscription subscription = container.register(readOptions, listener);
container.start();
return subscription;
}
#Bean
#Qualifier("task")
public Executor redisListenerThreadPoolTaskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(30);
threadPoolTaskExecutor.setMaxPoolSize(50);
threadPoolTaskExecutor.setQueueCapacity(Integer.MAX_VALUE);
threadPoolTaskExecutor.setThreadNamePrefix("redis-listener-");
threadPoolTaskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
return threadPoolTaskExecutor;
}

Related

Spring Integration - Scatter-Gather

I am using Spring Integration and Scatter Gather handler (https://docs.spring.io/spring-integration/docs/5.3.0.M1/reference/html/scatter-gather.html) in order to send 3 parallel requests (using ExecutorChannels) to external REST APIs and aggregate their response into one single message.
Everything works fine until exception is thrown within Aggregator's aggregatePayloads method (AggregatingMessageHandler). In this scenario error message is successfully delivered to Messaging Gateway which initiated the flow ( caller ). However, ScatterGatherHandler thread remains in hanging state waiting for gatherer reply (I believe) which never arrives due to the exception within it. I.e each sequential call leaves one additional thread in "stuck" state and eventually Thread Pool runs out of available working threads.
My current Scatter Gather configuration:
#Bean
public MessageHandler distributor() {
RecipientListRouter router = new RecipientListRouter();
router.setChannels(Arrays.asList(Channel1(asyncExecutor()),Channel2(asyncExecutor()),Channel3(asyncExecutor())));
return router;
}
#Bean
public MessageHandler gatherer() {
AggregatingMessageHandler aggregatingMessageHandler = new AggregatingMessageHandler(
new TransactionAggregator(),
new SimpleMessageStore(),
new HeaderAttributeCorrelationStrategy("correlationID"),
new ExpressionEvaluatingReleaseStrategy("size() == 3"));
aggregatingMessageHandler.setExpireGroupsUponCompletion( true );
return aggregatingMessageHandler;
}
#Bean
#ServiceActivator(inputChannel = "validationOutputChannel")
public MessageHandler scatterGatherDistribution() {
ScatterGatherHandler handler = new ScatterGatherHandler(distributor(), gatherer());
handler.setErrorChannelName("scatterGatherErrorChannel");
return handler;
}
#Bean("taskExecutor")
#Primary
public TaskExecutor asyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(4);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("AsyncThread-");
executor.initialize();
return executor;
}
So far the only solution that I found is to add RequiresReply and GatherTimeout values for ScatterGatherHandler like below:
handler.setGatherTimeout(120000L);
handler.setRequiresReply(true);
This will produce an exception and release ScatterGatherHandler's thread to the pull after specified timeout value and after aggregator's exception is delivered to the messaging gateway. I can see following message in the log:
[AsyncThread-1] [WARN] [o.s.m.c.GenericMessagingTemplate$TemporaryReplyChannel:] [{}] - Reply message received but the receiving thread has already received a reply: ErrorMessage
Is there any other way to achieve this? My main goal is to make sure that I am not blocking any threads in case of exception is thrown within aggregator's aggregatePayloads method.
Thank you.
Technically this is really an expect behavior. See docs: https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#scatter-gather-error-handling
In this case a reasonable, finite gatherTimeout must be configured for the ScatterGatherHandler. Otherwise it is going to be blocked waiting for a reply from the gatherer forever, by default.
There is really no way to break expectations from the BlockingQueue.take() from that ScatterGatherHandler code.

Spring Batch Parallel processing with JMS

I implemented a spring batch project that reads from a weblogic Jms queue (Custom Item Reader not message driven), then pass the Jms message data to an item writer (chunk = 1) where i call some APIs and write in DataBase.
However, i am trying to implement parallel Jms processing, reading in parallel Jms messages and passing them to the writer without waiting for the previous processes to complete.
I’ve used a DefaultMessageListenerContainer in a previous project and it offers a parallel consuming of jms messages, but in this project i have to use the spring batch framework.
I tried using the easiest solution (multi-threaded step) but it
didn’t work , JmsException : "invalid blocking receive when another
receive is in progress" which means probably that my reader is
statefull.
I thought about using remote partitioning but then i have to read all
messages and put the data into step execution contexts before calling
the slave steps, which isn't really efficient if dealing with a large
number of messages.
I looked a little bit into remote chunking, i understand that it passes data via queue channels, but i can't seem to find the utility in reading from a Jms and putting messages in a local queue for slave workers.
How can I approach this?
My code:
#Bean
Step step1() {
return steps.get("step1").<Message, DetectionIncoherenceLiqJmsOut>chunk(1)
.reader(reader()).processor(processor()).writer(writer())
.listener(stepListener()).build();
}
#Bean
Job job(#Qualifier("step1") Step step1) {
return jobs.get("job").start(step1).build();
}
Jms Code :
#Override
public void initQueueConnection() throws NamingException, JMSException {
Hashtable<String, String> properties = new Hashtable<String, String>();
properties.put(Context.INITIAL_CONTEXT_FACTORY, env.getProperty(WebLogicConstant.JNDI_FACTORY));
properties.put(Context.PROVIDER_URL, env.getProperty(WebLogicConstant.JMS_WEBLOGIC_URL_RECEIVE));
InitialContext vInitialContext = new InitialContext(properties);
QueueConnectionFactory vQueueConnectionFactory = (QueueConnectionFactory) vInitialContext
.lookup(env.getProperty(WebLogicConstant.JMS_FACTORY_RECEIVE));
vQueueConnection = vQueueConnectionFactory.createQueueConnection();
vQueueConnection.start();
vQueueSession = vQueueConnection.createQueueSession(false, 0);
Queue vQueue = (Queue) vInitialContext.lookup(env.getProperty(WebLogicConstant.JMS_QUEUE_RECEIVE));
consumer = vQueueSession.createConsumer(vQueue, "JMSCorrelationID IS NOT NULL");
}
#Override
public Message receiveMessages() throws NamingException, JMSException {
return consumer.receive(20000);
}
Item reader :
#Override
public Message read() throws Exception {
return jmsServiceReceiver.receiveMessages();
}
Thanks ! i'll appreciate the help :)
There's a BatchMessageListenerContainer in the spring-batch-infrastructure-tests sub project.
https://github.com/spring-projects/spring-batch/blob/d8fc58338d3b059b67b5f777adc132d2564d7402/spring-batch-infrastructure-tests/src/main/java/org/springframework/batch/container/jms/BatchMessageListenerContainer.java
Message listener container adapted for intercepting the message reception with advice provided through configuration.
To enable batching of messages in a single transaction, use the TransactionInterceptor and the RepeatOperationsInterceptor in the advice chain (with or without a transaction manager set in the base class). Instead of receiving a single message and processing it, the container will then use a RepeatOperations to receive multiple messages in the same thread. Use with a RepeatOperations and a transaction interceptor. If the transaction interceptor uses XA then use an XA connection factory, or else the TransactionAwareConnectionFactoryProxy to synchronize the JMS session with the ongoing transaction (opening up the possibility of duplicate messages after a failure). In the latter case you will not need to provide a transaction manager in the base class - it only gets on the way and prevents the JMS session from synchronizing with the database transaction.
Perhaps you could adapt it for your use case.
I was able to do so with a multithreaded step :
// Jobs et Steps
#Bean
Step stepDetectionIncoherencesLiq(#Autowired StepBuilderFactory steps) {
int threadSize = Integer.parseInt(env.getProperty(PropertyConstant.THREAD_POOL_SIZE));
return steps.get("stepDetectionIncoherencesLiq").<Message, DetectionIncoherenceLiqJmsOut>chunk(1)
.reader(reader()).processor(processor()).writer(writer())
.readerIsTransactionalQueue()
.faultTolerant()
.taskExecutor(taskExecutor())
.throttleLimit(threadSize)
.listener(stepListener())
.build();
}
And a jmsItemReader with jmsTemplate instead of creating session and connections explicitly, it manages connections so i dont have the jms exception anymore:( JmsException : "invalid blocking receive when another receive is in progress" )
#Bean
public JmsItemReader<Message> reader() {
JmsItemReader<Message> itemReader = new JmsItemReader<>();
itemReader.setItemType(Message.class);
itemReader.setJmsTemplate(jmsTemplate());
return itemReader;
}

Spring Integration Service Activator handler business logic

I am currently new to Spring Integration.
Basically trying to poll onto multiple file locations asynchronously with Java Spring integration DSL. I am required to get the file name and perform some operations with filename and push the file to S3 finally, my question is can these tasks of performing operations with file be performed in the task executor or the service activator handler . I am not sure which is the right place.
#Autowired
private AWSFileManager awsFileManager;
#Bean
public IntegrationFlow inboundChannelFlow(#Value("${file.poller.delay}") long delay,
#Value("${file.poller.messages}") int maxMsgsPerPoll,
TaskExecutor taskExecutor, MessageSource<File> fileSource)
{
return IntegrationFlows.from(fileSource,
c -> c.poller(Pollers.fixedDelay(delay)
.taskExecutor(taskExecutor)
.maxMessagesPerPoll(maxMsgsPerPoll)))
.handle("AWSFileManager", "fileUpload")
.channel(ApplicationConfiguration.inboundChannel)
.get();
}
#Bean
TaskExecutor taskExecutor(#Value("${file.poller.thread.pool.size}") int poolSize) {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
//Runnable task1 = () -> {this.methodsamp();};
taskExecutor.setCorePoolSize(poolSize);
//taskExecutor.execute(task1);
return taskExecutor;
}
#Async
public void methodsamp()
{
try
{
awsFileManager.fileUpload();
System.out.println("test");
}
catch(Exception ex)
{
}
I have attached the sample code here.
Also is there a way I could retrieve the filename of the files in the channel as I need to pass this as parameter to the fileUpload method.
Please advise.
Your question isn't clear. The TaskExecutor is for the thread context in the flow. The Service Activator (.handle()) is exactly for your business logic method. This one can be performed on a thread from the executor. And you really use them in your IntegrationFlow correctly.
The FileReadingMessageSource produces message with the java.io.File as a payload. So, that is the way to get a file name - just from File.getName()!

Spring kafka Batch Listener- commit offsets manually in Batch

I am implementing spring kafka batch listener, which reads list of messages from Kafka topic and posts the data to a REST service.
I would like to understand the offset management in case of the REST service goes down, the offsets for the batch should not be committed and the messages should be processed for the next poll. I have read spring kafka documentation but there is confusion in understanding the difference between Listener Error Handler and Seek to current container error handlers in batch. I am using spring-boot-2.0.0.M7 version and below is my code.
Listener Config:
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.parseInt(env.getProperty("spring.kafka.listener.concurrency")));
// factory.getContainerProperties().setPollTimeout(3000);
factory.getContainerProperties().setBatchErrorHandler(kafkaErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.BATCH);
factory.setBatchListener(true);
return factory;
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("spring.kafka.bootstrap-servers"));
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
env.getProperty("spring.kafka.consumer.enable-auto-commit"));
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,
env.getProperty("spring.kafka.consumer.auto-commit-interval"));
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, env.getProperty("spring.kafka.session.timeout"));
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, env.getProperty("spring.kafka.consumer.group-id"));
return propsMap;
}
Listener Class:
#KafkaListener(topics = "${spring.kafka.consumer.topic}", containerFactory = "kafkaListenerContainerFactory")
public void listen(List<String> payloadList) throws Exception {
if (payloadList.size() > 0)
//Post to the service
}
Kafka Error Handler:
public class KafkaErrorHandler implements BatchErrorHandler {
private static Logger LOGGER = LoggerFactory.getLogger(KafkaErrorHandler.class);
#Override
public void handle(Exception thrownException, ConsumerRecords<?, ?> data) {
LOGGER.info("Exception occured while processing::" + thrownException.getMessage());
}
}
How to handle Kafka listener so that if something happens during processing batch of records, I wouldn't loose data.
With Apache Kafka we never lose the data. There is indeed an offset in partition logs to seek to any arbitrary position.
On the other hand, when we consume records from a partition there is no requirement to commit their offsets - the current consumer holds the state in the memory. We need to commit only for other, new consumers in the same group when the current one is dead. Independently of the error, the current consumer always moves on to poll new data behind its current in-memory offset.
So, to reprocess the same data in the same consumer we definitely have to use seek operation to move the consumer back to the desired position. That's why Spring Kafka introduces SeekToCurrentErrorHandler:
This allows implementations to seek all unprocessed topic/partitions so the current record (and the others remaining) will be retrieved by the next poll. The SeekToCurrentErrorHandler does exactly this.
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_seek_to_current_container_error_handlers

Spring Kafka listenerExecutor

I'm setting up a kafka listener in a spring boot application and I can't seem to get the listener running in a pool using an executor. Here's my kafka configuration:
#Bean
ThreadPoolTaskExecutor messageProcessorExecutor() {
logger.info("Creating a message processor pool with {} threads", numThreads);
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(200);
exec.setMaxPoolSize(200);
exec.setKeepAliveSeconds(30);
exec.setAllowCoreThreadTimeOut(true);
exec.setQueueCapacity(0); // Yields a SynchronousQueue
exec.setThreadFactory(ThreadFactoryFactory.defaultNamingFactory("kafka", "processor"));
return exec;
}
#Bean
public ConsumerFactory<String, PollerJob> consumerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
DefaultKafkaConsumerFactory<String, PollerJob> factory = new DefaultKafkaConsumerFactory<>(props,
new StringDeserializer(),
new JsonDeserializer<>(PollerJob.class));
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, PollerJob> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, PollerJob> factory
= new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.valueOf(kafkaThreads));
factory.getContainerProperties().setListenerTaskExecutor(messageProcessorExecutor());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
The ThreadFactoryFactory used by the ThreadPoolTaskExecutor just makes sure the thread is named like 'kafka-1-processor-1'.
The ConsumerFactory has the ENABLE_AUTO_COMMIT_CONFIG flag set to false and I'm using manual mode for the acknowledgement which is required to use executors according to the documentation.
My listener looks like this:
#KafkaListener(topics = "my_topic",
group = "my_group",
containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload SomeJob job, Acknowledgment ack) {
ack.acknowledge();
logger.info("Running job {}", job.getId());
....
}
Using the Admin Server I can inspect all the threads and only one kafka-N-processor-N threads is being created but I expected to see up to 200. The jobs are all running one at a time on the that one thread and I can't figure out why.
How can I get this setup to run the listeners using my executor with as many threads as possible?
I'm using Spring Boot 1.5.4.RELEASE and kafka 0.11.0.0.
If your topic has only one partition, according the consumer group policy, only one consumer is able to poll that partition.
The ConcurrentMessageListenerContainer indeed creates as much target KafkaMessageListenerContainer instances as provided concurrency. And it does that only in case it doesn't know the number of partitions in the topic.
When the rebalance in consumer group happens only one consumer gets partition for consuming. All the work is really done there in a single thread:
private void startInvoker() {
ListenerConsumer.this.invoker = new ListenerInvoker();
ListenerConsumer.this.listenerInvokerFuture = this.containerProperties.getListenerTaskExecutor()
.submit(ListenerConsumer.this.invoker);
}
One partition - one thread for sequential records processing.

Resources