I'm using Spring Integration with Redis. The producer uses RedisQueueOutboundGateway and on the other side the receiver have a flow defined with RedisQueueInboundGateway.
Reading from the documentation I found the following sentence
The task-executor has to be configured with more than one thread for processing
My need is to have concurrent executions, in order to speed up the elaboration of requests, but I can see there is always one thread even if I configured a custom ThreadPoolTaskExecutor like the following
public Executor getAsyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(40);
executor.setQueueCapacity(40);
executor.setThreadNamePrefix("QueueAsyncExecutor-");
executor.initialize();
return executor;
}
and the use of this threadpool is
final RedisQueueInboundGateway rqig = new RedisQueueInboundGateway(finalDestination, jedisConnectionFactory);
rqig.setTaskExecutor(getAsyncExecutor());
The final result is a sequential processing of the requests, all done with the same thread as i can see from the log. Is it possible to enable the multithread processing in that situation? How?
That's correct. The RedisQueueInboundGateway is a single-threaded for now. There is only one ListenerTask:
private void restart() {
this.taskExecutor.execute(new ListenerTask());
}
Sounds like we need to introduce concurrency option into that RedisQueueInboundGateway! Feel free to raise a JIRA on the matter and contribution is welcome!
You may achieve an artificial concurrency with several RedisQueueInboundGateway instances for the same Redis queue. This way each of them will start its own ListenerTask.
Related
I'm hoping this is a simple configuration issue but I can't seem to figure out what it might be.
Set-up
Spring-Boor 2.2.2.RELEASE
cloud-starter
cloud-starter-aws
spring-jms
spring-cloud-dependencies Hoxton.SR1
amazon-sqs-java-messaging-lib 1.0.8
Problem
My application starts up fine and begins to process messages from Amazon SQS. After some amount of time I see the following warning
2020-02-01 04:16:21.482 LogLevel=WARN 1 --- [ecutor-thread14] o.s.j.l.DefaultMessageListenerContainer : Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers.
The above warning gets printed multiple times and eventually I see the following two INFO messages
2020-02-01 04:17:51.552 LogLevel=INFO 1 --- [ecutor-thread40] c.a.s.javamessaging.SQSMessageConsumer : Shutting down ConsumerPrefetch executor
2020-02-01 04:18:06.640 LogLevel=INFO 1 --- [ecutor-thread40] com.amazon.sqs.javamessaging.SQSSession : Shutting down SessionCallBackScheduler executor
The above 2 messages will display several times and at some point no more messages are consumed from SQS. I don't see any other messages in my log to indicate an issue, but I get no messages from my handlers that they are processing messages (I have 2~) and I can see the AWS SQS queue growing in the number of messages and the age.
~: This exact code was working fine when I had a single handler, this problem started when I added the second one.
Configuration/Code
The first "WARNing" I realize is caused by the currency of the ThreadPoolTaskExecutor, but I can not get a configuration which works properly. Here is my current configuration for the JMS stuff, I have tried various levels of max pool size with no real affect other than the warings start sooner or later based on the pool size
public ThreadPoolTaskExecutor asyncAppConsumerTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setThreadGroupName("asyncConsumerTaskExecutor");
taskExecutor.setThreadNamePrefix("asyncConsumerTaskExecutor-thread");
taskExecutor.setCorePoolSize(10);
// Allow the thread pool to grow up to 4 times the core size, evidently not
// having the pool be larger than the max concurrency causes the JMS queue
// to barf on itself with messages like
// "Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers"
taskExecutor.setMaxPoolSize(10 * 4);
taskExecutor.setQueueCapacity(0); // do not queue up messages
taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
taskExecutor.setAwaitTerminationSeconds(60);
return taskExecutor;
}
Here is the JMS Container Factory we create
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(SQSConnectionFactory sqsConnectionFactory, ThreadPoolTaskExecutor asyncConsumerTaskExecutor) {
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(sqsConnectionFactory);
factory.setDestinationResolver(new DynamicDestinationResolver());
// The JMS processor will start 'concurrency' number of tasks
// and supposedly will increase this to the max of '10 * 3'
factory.setConcurrency(10 + "-" + (10 * 3));
factory.setTaskExecutor(asyncConsumerTaskExecutor);
// Let the task process 100 messages, default appears to be 10
factory.setMaxMessagesPerTask(100);
// Wait up to 5 seconds for a timeout, this keeps the task around a bit longer
factory.setReceiveTimeout(5000L);
factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return factory;
}
I added the setMaxMessagesPerTask & setReceiveTimeout calls based on stuff found on the internet, the problem persists without these and at various settings (50, 2500L, 25, 1000L, etc...)
We create a default SQS connection factory
public SQSConnectionFactory sqsConnectionFactory(AmazonSQS amazonSQS) {
return new SQSConnectionFactory(new ProviderConfiguration(), amazonSQS);
}
Finally the handlers look like this
#JmsListener(destination = "consumer-event-queue")
public void receiveEvents(String message) throws IOException {
MyEventDTO myEventDTO = jsonObj.readValue(message, MyEventDTO.class);
//messageTask.process(myEventDTO);
}
#JmsListener(destination = "myalert-sqs")
public void receiveAlerts(String message) throws IOException, InterruptedException {
final MyAlertDTO myAlert = jsonObj.readValue(message, MyAlertDTO.class);
myProcessor.addAlertToQueue(myAlert);
}
You can see in the first function (receiveEvents) we just take the message from the queue and exit, we have not implemented the processing code for that.
The second function (receiveAlerts) gets the message, the myProcessor.addAlertToQueue function creates a runnable object and submits it to a threadpool to be processed at some point in the future.
The problem only started (the warning, info and failure to consume messages) only started when we added the receiveAlerts function, previously the other function was the only one present and we did not see this behavior.
More
This is part of a larger project and I am working on breaking this code out into a smaller test case to see if I can duplicate this issue. I will post a follow-up with the results.
In the Mean Time
I'm hoping this is just a config issue and someone more familiar with this can tell me what I'm doing wrong, or that someone can provide some thoughts and comments on how to correct this to work properly.
Thank you!
After fighting this one for a bit I think I finally resolved it.
The issue appears to be due to the "DefaultJmsListenerContainerFactory", this factory creates a new "DefaultJmsListenerContainer" for EACH method with a '#JmsListener' annotation. The person who originally wrote the code thought it was only called once for the application, and the created container would be re-used. So the issue was two-fold
The 'ThreadPoolTaskExecutor' attached to the factory had 40 threads, when the application had 1 '#JmsListener' method this worked fine, but when we aded a second method then each method got 10 threads (total of 20) for listening. This is fine, however; since we stated that each listener could grow up to 30 listeners we quickly ran out of threads in the pool mentioned in 1 above. This caused the "Number of scheduled consumers has dropped below concurrentConsumers limit" error
This is probably obvious given the above, but I wanted to call it out explicitly. In the Listener Factory we set the concurrency to be "10-30", however; all of the listeners have to share that pool. As such the max concurrency has to be setup so that each listeners' max value is small enough so that if each listener creates its maximum that it doesn't exceed the maximum number of threads in the pool (e.g. if we have 2 '#JmsListener' annotated methods and a pool with 40 threads, then the max value can be no more than 20).
Hopefully this might help someone else with a similar issue in the future....
I am progressing on writing my first Kafka Consumer by using Spring-Kafka. Had a look at the different options provided by framework, and have few doubts on the same. Can someone please clarify below if you have already worked on it.
Question - 1 : As per Spring-Kafka documentation, there are 2 ways to implement Kafka-Consumer; "You can receive messages by configuring a MessageListenerContainer and providing a message listener or by using the #KafkaListener annotation". Can someone tell when should I choose one option over another ?
Question - 2 : I have chosen KafkaListener approach for writing my application. For this I need to initialize a container factory instance and inside container factory there is option to control concurrency. Just want to double check if my understanding about concurrency is correct or not.
Suppose, I have a topic name MyTopic which has 4 partitions in it. And to consume messages from MyTopic, I've started 2 instances of my application and these instances are started by setting concurrency as 2. So, Ideally as per kafka assignment strategy, 2 partitions should go to consumer1 and 2 other partitions should go to consumer2. Since the concurrency is set as 2, does each of the consumer will start 2 threads, and will consume data from the topics in parallel ? Also should we consider anything if we are consuming in parallel.
Question 3 - I have chosen manual ack mode, and not managing the offsets externally (not persisting it to any database/filesystem). So should I need to write custom code to handle rebalance, or framework will manage it automatically ? I think no as I am acknowledging only after processing all the records.
Question - 4 : Also, with Manual ACK mode, which Listener will give more performance? BATCH Message Listener or normal Message Listener. I guess if I use Normal Message listener, the offsets will be committed after processing each of the messages.
Pasted the code below for your reference.
Batch Acknowledgement Consumer:
public void onMessage(List<ConsumerRecord<String, String>> records, Acknowledgment acknowledgment,
Consumer<?, ?> consumer) {
for (ConsumerRecord<String, String> record : records) {
System.out.println("Record : " + record.value());
// Process the message here..
listener.addOffset(record.topic(), record.partition(), record.offset());
}
acknowledgment.acknowledge();
}
Initialising container factory:
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<String, String>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> configs = new HashMap<String, Object>();
configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServer);
configs.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
configs.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, enablAutoCommit);
configs.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, maxPolInterval);
configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
configs.put(ConsumerConfig.CLIENT_ID_CONFIG, clientId);
configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return configs;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
// Not sure about the impact of this property, so going with 1
factory.setConcurrency(2);
factory.setBatchListener(true);
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
factory.getContainerProperties().setConsumerRebalanceListener(RebalanceListener.getInstance());
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setMessageListener(new BatchAckConsumer());
return factory;
}
#KafkaListener is a message-driven "POJO" it adds stuff like payload conversion, argument matching, etc. If you implement MessageListener you can only get the raw ConsumerRecord from Kafka. See #KafkaListener Annotation.
Yes, the concurrency represents the number of threads; each thread creates a Consumer; they run in parallel; in your example, each would get 2 partitions.
Also should we consider anything if we are consuming in parallel.
Your listener must be thread-safe (no shared state or any such state needs to be protected by locks.
It's not clear what you mean by "handle rebalance events". When a rebalance occurs, the framework will commit any pending offsets.
It doesn't make a difference; message listener Vs. batch listener is just a preference. Even with a message listener, with MANUAL ackmode, the offsets are committed when all the results from the poll have been processed. With MANUAL_IMMEDIATE mode, the offsets are committed one-by-one.
Q1:
From the documentation,
The #KafkaListener annotation is used to designate a bean method as a
listener for a listener container. The bean is wrapped in a
MessagingMessageListenerAdapter configured with various features, such
as converters to convert the data, if necessary, to match the method
parameters.
You can configure most attributes on the annotation with SpEL by using
"#{…} or property placeholders (${…}). See the Javadoc for more information."
This approach can be useful for simple POJO listeners and you do not need to implement any interfaces. You are also enabled to listen on any topics and partitions in a declarative way using the annotations. You can also potentially return the value you received whereas in case of MessageListener, you are bound by the signature of the interface.
Q2:
Ideally yes. If you have multiple topics to consume from, it gets more complicated though. Kafka by default uses RangeAssignor which has its own behaviour (you can change this -- see more details under).
Q3:
If your consumer dies, there will be rebalancing. If you acknowledge manually and your consumer dies before committing offsets, you do not need to do anything, Kafka handles that. But you could end up with some duplicate messages (at-least once)
Q4:
It depends what you mean by "performance". If you meant latency, then consuming each record as fast as possible will be the way to go. If you want to achieve high throughput, then batch consumption is more efficient.
I had written some samples using Spring kafka and various listeners - check out this repo
In my Spring Batch configuration I have this:
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor("myJob");
asyncTaskExecutor.setConcurrencyLimit(15);
asyncTaskExecutor.setThreadNamePrefix("SrcToDest");
return taskExecutor;
}
And also I have a "master-step" where I am setting the grid-size as per below:
#Bean
#Qualifier("masterStep")
public Step masterStep() {
return stepBuilderFactory.get("masterStep").partitioner("step1", partitioner()).step(step1())
.taskExecutor(threadpooltaskExecutor()).taskExecutor(taskExecutor())
.gridSize(10).build();
}
In my case, I see only "Thread-x" at the end when "myjob" finishes with "COMPLETED" status.
Questions
In order to monitor how can I print the thread number to the console/log throughout the execution process? i.e. "myjob" start to finish
Is there some way I can get the output to console/log to see the grid action too?
I could not find any example or anywhere in Spring Guides for these.
Still looking how to display grid numbers to console
This depends on your partitioner. You can add a log statement in your partitioner and show the grid size. So at partitioning time, it's on your side.
At partition handling time, Spring Batch will show a log statement at debug level of each execution of the worker step.
I have seen this code many times but don't know what is the advantage/disadvantage for it. In Spring Boot applications, I saw people define this bean.
#Bean
#Qualifier("heavyLoadBean")
public ExecutorService heavyLoadBean() {
return Executors.newWorkStealingPool();
}
Then whenever a CompletableFuture object is created in the service layer, that heavyLoadBean is used.
public CompletionStage<T> myService() {
return CompletableFuture.supplyAsync(() -> doingVeryBigThing(), heavyLoadBean);
}
Then the controller will call the service.
#GetMapping("/some/path")
public CompletionStage<SomeModel> doIt() {
return service.myService();
}
I don't see the point of doing that. Tomcat in Spring Boot has x number of threads. All the threads are used to process user requests. What is the point of using a different thread pool here? Anyway the user expects to see response coming back.
CompletableFuture is used process the tasks asynchronously, suppose in your application if you have two tasks independent of each other then you can execute two tasks concurrently (to reduce the processing time)
public CompletionStage<T> myService() {
CompletableFuture.supplyAsync(() -> doingVeryBigThing(), heavyLoadBean);
CompletableFuture.supplyAsync(() -> doingAnotherBigThing(), heavyLoadBean);
}
In the above example doingVeryBigThing() and doingAnotherBigThing() two tasks which are independent of each other, so now these two tasks will be executed concurrently with two different threads from heavyLoadBean thread pool, try below example will print the two different thread names.
public CompletionStage<T> myService() {
CompletableFuture.supplyAsync(() -> System.out.println(Thread.currentThread().getName(), heavyLoadBean);
CompletableFuture.supplyAsync(() -> System.out.println(Thread.currentThread().getName(), heavyLoadBean);
}
If you don't provide the thread pool, by default supplied Supplier will be executed by ForkJoinPool.commonPool()
public static CompletableFuture supplyAsync(Supplier supplier)
Returns a new CompletableFuture that is asynchronously completed by a task running in the ForkJoinPool.commonPool() with the value obtained by calling the given Supplier.
public static CompletableFuture supplyAsync(Supplier supplier,
Executor executor)
Returns a new CompletableFuture that is asynchronously completed by a task running in the given executor with the value obtained by calling the given Supplier.
Please check comments in the main post and other solutions. They will give you more understanding of java 8 CompletableFuture. I'm just not feeling the right answer was given though.
From our discussions, I can see the purpose of having a different thread pool instead of using the default thread pool is that the default thread pool is also used by the main web server (spring boot - tomcat). Let's say 8 threads.
If we use up all 8 threads, server appears to be irresponsive. However, if you use a different thread pool and exhaust that thread pool with your long running processes, you will get a different errors in your code. Therefore, the server can still response to other user requests.
Correct me if I'm wrong.
Even after reading plenty of SO questions (1,2) and articles, It is unclear on which is the better option to set for consumers. Multiple consumers or a higher prefetch value?
From what I understand, when it comes to SimpleRabbitListenerContainerFactory, as it was designed initially to have only one thread per connection it was designed to address a limitation that the amqp-client only had one thread per connection, does that mean that setting multiple consumers won't make much difference as there is only one thread that actually consumes from rabbit and than hands it off to the multiple consumers (threads)?
Or there are actually several consumers consuming at the same time?
So what is the best practice when it comes to spring implementation of rabbit concerning prefetch/consumers? When should one be used over the other? And should I switch to this new DirectRabbitListenerContainerFactory? Is it 'better' or just depends on the use case?
Some downsides I see when it comes to high prefetch is that maybe it can cause memory issues if an app consumes more messages that it can hold in the buffer? (haven't actually tested this yet, tbh)
And when it comes to multiple consumers, I see the downside of having more file descriptors opened on OS level and I saw this article about that each consumer actually pings rabbit for each ack and this making it slower.
FYI, if it is relevant, I usually have my config set up like this:
#Bean
public ConnectionFactory connectionFactory() {
final CachingConnectionFactory connectionFactory = new CachingConnectionFactory(server);
connectionFactory.setUsername(username);
connectionFactory.setPassword(password);
connectionFactory.setVirtualHost(virtualHost);
connectionFactory.setRequestedHeartBeat(requestedHeartBeat);
return connectionFactory;
}
#Bean
public AmqpAdmin amqpAdmin() {
AmqpAdmin admin = new RabbitAdmin(connectionFactory());
admin.declareQueue(getRabbitQueue());
return admin;
}
#Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory() {
final SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory());
factory.setConcurrentConsumers(concurrency);
factory.setMaxConcurrentConsumers(maxConcurrency);
factory.setPrefetchCount(prefetch);
factory.setMissingQueuesFatal(false);
return factory;
}
#Bean
public Queue getRabbitQueue() {
final Map<String, Object> p = new HashMap<String, Object>();
p.put("x-max-priority", 10);
return new Queue(queueName, true, false, false, p);
}
No; the SMLC wasn't "designed for one thread per connection" it was designed to address a limitation that the amqp-client only had one thread per connection so that thread hands off to consumer threads via an in-memory queue; that is no longer the case. The client is multi-threaded and there is one dedicated thread per consumer.
Having multiple consumers (increasing the concurrency) is completely effective (and was, even with the older client).
Prefetch is really to reduce network chatter and improve overall throughput. Whether you need to increase concurrency really is orthogonal to prefetch. You would typically increase concurrency if (a) your listener is relatively slow to process each message and (b) strict message ordering is not important.
The DirectListenerContainer was introduced to provide a different threading model, where the listener is invoked directly on the amqp-client thread.
The reasons for choosing one container over the other is described in Choosing a Container.
The following features are available with the SMLC, but not the DMLC:
txSize - with the SMLC, you can set this to control how many messages are delivered in a transaction and/or to reduce the number of acks, but it may cause the number of duplicate deliveries to increase after a failure. (The DMLC does have mesagesPerAck which can be used to reduce the acks, the same as with txSize and the SMLC, but it can’t be used with transactions - each message is delivered and ack’d in a separate transaction).
maxConcurrentConsumers and consumer scaling intervals/triggers - there is no auto-scaling in the DMLC; it does, however, allow you to programmatically change the consumersPerQueue property and the consumers will be adjusted accordingly.
However, the DMLC has the following benefits over the SMLC:
Adding and removing queues at runtime is more efficient; with the SMLC, the entire consumer thread is restarted (all consumers canceled and re-created); with the DMLC, unaffected consumers are not canceled.
The context switch between the RabbitMQ Client thread and the consumer thread is avoided.
Threads are shared across consumers rather than having a dedicated thread for each consumer in the SMLC. However, see the IMPORTANT note about the connection factory configuration in the section called “Threading and Asynchronous Consumers”.