I'm hoping this is a simple configuration issue but I can't seem to figure out what it might be.
Set-up
Spring-Boor 2.2.2.RELEASE
cloud-starter
cloud-starter-aws
spring-jms
spring-cloud-dependencies Hoxton.SR1
amazon-sqs-java-messaging-lib 1.0.8
Problem
My application starts up fine and begins to process messages from Amazon SQS. After some amount of time I see the following warning
2020-02-01 04:16:21.482 LogLevel=WARN 1 --- [ecutor-thread14] o.s.j.l.DefaultMessageListenerContainer : Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers.
The above warning gets printed multiple times and eventually I see the following two INFO messages
2020-02-01 04:17:51.552 LogLevel=INFO 1 --- [ecutor-thread40] c.a.s.javamessaging.SQSMessageConsumer : Shutting down ConsumerPrefetch executor
2020-02-01 04:18:06.640 LogLevel=INFO 1 --- [ecutor-thread40] com.amazon.sqs.javamessaging.SQSSession : Shutting down SessionCallBackScheduler executor
The above 2 messages will display several times and at some point no more messages are consumed from SQS. I don't see any other messages in my log to indicate an issue, but I get no messages from my handlers that they are processing messages (I have 2~) and I can see the AWS SQS queue growing in the number of messages and the age.
~: This exact code was working fine when I had a single handler, this problem started when I added the second one.
Configuration/Code
The first "WARNing" I realize is caused by the currency of the ThreadPoolTaskExecutor, but I can not get a configuration which works properly. Here is my current configuration for the JMS stuff, I have tried various levels of max pool size with no real affect other than the warings start sooner or later based on the pool size
public ThreadPoolTaskExecutor asyncAppConsumerTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setThreadGroupName("asyncConsumerTaskExecutor");
taskExecutor.setThreadNamePrefix("asyncConsumerTaskExecutor-thread");
taskExecutor.setCorePoolSize(10);
// Allow the thread pool to grow up to 4 times the core size, evidently not
// having the pool be larger than the max concurrency causes the JMS queue
// to barf on itself with messages like
// "Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers"
taskExecutor.setMaxPoolSize(10 * 4);
taskExecutor.setQueueCapacity(0); // do not queue up messages
taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
taskExecutor.setAwaitTerminationSeconds(60);
return taskExecutor;
}
Here is the JMS Container Factory we create
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(SQSConnectionFactory sqsConnectionFactory, ThreadPoolTaskExecutor asyncConsumerTaskExecutor) {
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(sqsConnectionFactory);
factory.setDestinationResolver(new DynamicDestinationResolver());
// The JMS processor will start 'concurrency' number of tasks
// and supposedly will increase this to the max of '10 * 3'
factory.setConcurrency(10 + "-" + (10 * 3));
factory.setTaskExecutor(asyncConsumerTaskExecutor);
// Let the task process 100 messages, default appears to be 10
factory.setMaxMessagesPerTask(100);
// Wait up to 5 seconds for a timeout, this keeps the task around a bit longer
factory.setReceiveTimeout(5000L);
factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return factory;
}
I added the setMaxMessagesPerTask & setReceiveTimeout calls based on stuff found on the internet, the problem persists without these and at various settings (50, 2500L, 25, 1000L, etc...)
We create a default SQS connection factory
public SQSConnectionFactory sqsConnectionFactory(AmazonSQS amazonSQS) {
return new SQSConnectionFactory(new ProviderConfiguration(), amazonSQS);
}
Finally the handlers look like this
#JmsListener(destination = "consumer-event-queue")
public void receiveEvents(String message) throws IOException {
MyEventDTO myEventDTO = jsonObj.readValue(message, MyEventDTO.class);
//messageTask.process(myEventDTO);
}
#JmsListener(destination = "myalert-sqs")
public void receiveAlerts(String message) throws IOException, InterruptedException {
final MyAlertDTO myAlert = jsonObj.readValue(message, MyAlertDTO.class);
myProcessor.addAlertToQueue(myAlert);
}
You can see in the first function (receiveEvents) we just take the message from the queue and exit, we have not implemented the processing code for that.
The second function (receiveAlerts) gets the message, the myProcessor.addAlertToQueue function creates a runnable object and submits it to a threadpool to be processed at some point in the future.
The problem only started (the warning, info and failure to consume messages) only started when we added the receiveAlerts function, previously the other function was the only one present and we did not see this behavior.
More
This is part of a larger project and I am working on breaking this code out into a smaller test case to see if I can duplicate this issue. I will post a follow-up with the results.
In the Mean Time
I'm hoping this is just a config issue and someone more familiar with this can tell me what I'm doing wrong, or that someone can provide some thoughts and comments on how to correct this to work properly.
Thank you!
After fighting this one for a bit I think I finally resolved it.
The issue appears to be due to the "DefaultJmsListenerContainerFactory", this factory creates a new "DefaultJmsListenerContainer" for EACH method with a '#JmsListener' annotation. The person who originally wrote the code thought it was only called once for the application, and the created container would be re-used. So the issue was two-fold
The 'ThreadPoolTaskExecutor' attached to the factory had 40 threads, when the application had 1 '#JmsListener' method this worked fine, but when we aded a second method then each method got 10 threads (total of 20) for listening. This is fine, however; since we stated that each listener could grow up to 30 listeners we quickly ran out of threads in the pool mentioned in 1 above. This caused the "Number of scheduled consumers has dropped below concurrentConsumers limit" error
This is probably obvious given the above, but I wanted to call it out explicitly. In the Listener Factory we set the concurrency to be "10-30", however; all of the listeners have to share that pool. As such the max concurrency has to be setup so that each listeners' max value is small enough so that if each listener creates its maximum that it doesn't exceed the maximum number of threads in the pool (e.g. if we have 2 '#JmsListener' annotated methods and a pool with 40 threads, then the max value can be no more than 20).
Hopefully this might help someone else with a similar issue in the future....
Related
I am trying to write a kafka consumer application in spring-kafka. As consumer, I have to make sure I am not missing any record and all records should get processed.
My application design is like this :
Topics --> Read records from topic --> dump it into a table A in oracle database --> pick records from a table A --> call rest api to update records in system table B --> update response of API in table a --> commit records
Retry Mechanism on API level :
Now, if any of the records gets failed, means the response code is not as desired (400,500 etc..). We would retry those records 2 times.
Retry Mechanism on Topic level :
But, what if I got an error while committing offsets ? means, I need to have some kind of retry mechanism on the topic level as well. I went over documents and found an option :SeekToCurrentErrorHandler
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setErrorHandler(new SeekToCurrentErrorHandler(new FixedBackOff(1000L, 2L)));
return factory;
}
Now, what I understand, suppose If I am not able to commit any offsets, then after adding above code, this will retry a delivery up to 2 times (3 delivery attempts) with a back off of 1 second. So, does this means, my whole flow will be replayed twice ? if this is true, then do I need to add retry mechanism on the API level separately ?
I am just trying to understand, how can I make my consumer application more resilient so I don't miss any record from processing and should have error mechanism to handle any error/missed records. Please suggest.
It's best to avoid situations where the offsets can't be committed (make sure the max.poll.interval.ms is sufficient).
But, yes, if committing the offsets fails (and commitSync is true) then the record will be redelivered to the application. If commitSync is false, the failure will simply be logged (or sent to your listener) and the "next" offset for that partition will have its offset committed later (presumably).
Adding retry at the application level (e.g. using a RetryTemplate in the listener adapter - via the container factory) will still suffer from the same problem; it also can cause a rebalance if the retries take too long.
If you really want to avoid reprocessing in this situation, you need to make your listener code idempotent - e.g. store the topic/partition/offset someplace to indicate you have already processed that record.
We use spring-integration framework extensively. In one of our use case we pull large amount of data from third party api . Doing so it takes some time like 60 sec or more to get a 200 ok response and data . But in some cases the data is so large that we start getting
o.s.a.r.l.SimpleMessageListenerContainer Stopping container from aborted consumer java.lang.OutOfMemoryError: Java heap space
When this error comes the queue consumer (thread) dies and is reflected rabbit mq console. I want to figure out a way where I can catch this error log in my application so that a relevant error is raised .
#Service
public class FaiureListener implements ApplicationListener<ListenerContainerConsumerFailedEvent> {
#Autowired
HangoutAlertPoster alertSender;
#Override
public void onApplicationEvent(ListenerContainerConsumerFailedEvent event) {
alertSender.sendHangoutAlert("[FATAL] Consumer aborted error. Reason="+event.getReason());
}
}
OOM errors are generally fatal and you have to restart the JVM.
You can add an ApplicationListener or #EventListener to receive a ListenerContainerConsumerFailedEvent which contains the cause.
See https://docs.spring.io/spring-amqp/docs/2.2.8.RELEASE/reference/html/#consumer-events
If your messages are large, you should reduce the prefetch count so that fewer messages are held in memory, or consider using a DirectMessageListenerContainer instead.
See Choosing a Container.
I profiled my kafka producer spring boot application and found many "kafka-producer-network-thread"s running (47 in total). Which would never stop running, even when no data is sending.
My application looks a bit like this:
var kafkaSender = KafkaSender(kafkaTemplate, applicationProperties)
kafkaSender.sendToKafka(json, rs.getString("KEY"))
with the KafkaSender:
#Service
class KafkaSender(val kafkaTemplate: KafkaTemplate<String, String>, val applicationProperties: ApplicationProperties) {
#Transactional(transactionManager = "kafkaTransactionManager")
fun sendToKafka(message: String, stringKey: String) {
kafkaTemplate.executeInTransaction { kt ->
kt.send(applicationProperties.kafka.topic, System.currentTimeMillis().mod(10).toInt(), System.currentTimeMillis().rem(10).toString(),
message)
}
}
companion object {
val log = LoggerFactory.getLogger(KafkaSender::class.java)!!
}
}
Since each time I want to send a message to Kafka I instantiate a new KafkaSender, I thought a new thread would be created which then sends the message to the kafka queue.
Currently it looks like a pool of producers is generated, but never cleaned up, even when none of them has anything to do.
Is this behaviour intended?
In my opinion the behaviour should be nearly the same as datasource pooling, keep the thread alive for some time, but when there is nothing to do, clear it up.
When using transactions, the producer cache grows on demand and is not reduced.
If you are producing messages on a listener container (consumer) thread; there is a producer for each topic/partition/consumer group. This is required to solve the zombie fencing problem, so that if a rebalance occurs and the partition moves to a different instance, the transaction id will remain the same so the broker can properly handle the situation.
If you don't care about the zombie fencing problem (and you can handle duplicate deliveries), set the producerPerConsumerPartition property to false on the DefaultKafkaProducerFactory and the number of producers will be much smaller.
EDIT
Starting with version 2.8 the default EOSMode is now V2 (aka BETA); which means it is no longer necessary to have a producer per topic/partition/group - as long as the broker version is 2.5 or later.
Even after reading plenty of SO questions (1,2) and articles, It is unclear on which is the better option to set for consumers. Multiple consumers or a higher prefetch value?
From what I understand, when it comes to SimpleRabbitListenerContainerFactory, as it was designed initially to have only one thread per connection it was designed to address a limitation that the amqp-client only had one thread per connection, does that mean that setting multiple consumers won't make much difference as there is only one thread that actually consumes from rabbit and than hands it off to the multiple consumers (threads)?
Or there are actually several consumers consuming at the same time?
So what is the best practice when it comes to spring implementation of rabbit concerning prefetch/consumers? When should one be used over the other? And should I switch to this new DirectRabbitListenerContainerFactory? Is it 'better' or just depends on the use case?
Some downsides I see when it comes to high prefetch is that maybe it can cause memory issues if an app consumes more messages that it can hold in the buffer? (haven't actually tested this yet, tbh)
And when it comes to multiple consumers, I see the downside of having more file descriptors opened on OS level and I saw this article about that each consumer actually pings rabbit for each ack and this making it slower.
FYI, if it is relevant, I usually have my config set up like this:
#Bean
public ConnectionFactory connectionFactory() {
final CachingConnectionFactory connectionFactory = new CachingConnectionFactory(server);
connectionFactory.setUsername(username);
connectionFactory.setPassword(password);
connectionFactory.setVirtualHost(virtualHost);
connectionFactory.setRequestedHeartBeat(requestedHeartBeat);
return connectionFactory;
}
#Bean
public AmqpAdmin amqpAdmin() {
AmqpAdmin admin = new RabbitAdmin(connectionFactory());
admin.declareQueue(getRabbitQueue());
return admin;
}
#Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory() {
final SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory());
factory.setConcurrentConsumers(concurrency);
factory.setMaxConcurrentConsumers(maxConcurrency);
factory.setPrefetchCount(prefetch);
factory.setMissingQueuesFatal(false);
return factory;
}
#Bean
public Queue getRabbitQueue() {
final Map<String, Object> p = new HashMap<String, Object>();
p.put("x-max-priority", 10);
return new Queue(queueName, true, false, false, p);
}
No; the SMLC wasn't "designed for one thread per connection" it was designed to address a limitation that the amqp-client only had one thread per connection so that thread hands off to consumer threads via an in-memory queue; that is no longer the case. The client is multi-threaded and there is one dedicated thread per consumer.
Having multiple consumers (increasing the concurrency) is completely effective (and was, even with the older client).
Prefetch is really to reduce network chatter and improve overall throughput. Whether you need to increase concurrency really is orthogonal to prefetch. You would typically increase concurrency if (a) your listener is relatively slow to process each message and (b) strict message ordering is not important.
The DirectListenerContainer was introduced to provide a different threading model, where the listener is invoked directly on the amqp-client thread.
The reasons for choosing one container over the other is described in Choosing a Container.
The following features are available with the SMLC, but not the DMLC:
txSize - with the SMLC, you can set this to control how many messages are delivered in a transaction and/or to reduce the number of acks, but it may cause the number of duplicate deliveries to increase after a failure. (The DMLC does have mesagesPerAck which can be used to reduce the acks, the same as with txSize and the SMLC, but it can’t be used with transactions - each message is delivered and ack’d in a separate transaction).
maxConcurrentConsumers and consumer scaling intervals/triggers - there is no auto-scaling in the DMLC; it does, however, allow you to programmatically change the consumersPerQueue property and the consumers will be adjusted accordingly.
However, the DMLC has the following benefits over the SMLC:
Adding and removing queues at runtime is more efficient; with the SMLC, the entire consumer thread is restarted (all consumers canceled and re-created); with the DMLC, unaffected consumers are not canceled.
The context switch between the RabbitMQ Client thread and the consumer thread is avoided.
Threads are shared across consumers rather than having a dedicated thread for each consumer in the SMLC. However, see the IMPORTANT note about the connection factory configuration in the section called “Threading and Asynchronous Consumers”.
Two questions:
I have an #StreamListener reading from a RabbitMQ channel. I have a pool of 500 ThreadTaskExecutor instances to process the messages as they are read.
The problem is that #StreamListener is reading messages even if the pool is completely utilized.
Caused by: org.springframework.core.task.TaskRejectedException:
Executor [java.util.concurrent.ThreadPoolExecutor#4c15ce96
[Running, pool size = 500, active threads = 500, queued tasks = 1500,
completed tasks = 1025020]] did not accept task:
org.springframework.cloud.sleuth.instrument.async.SpanContinuingTraceCallable#4dc03919
Is there a way to configure #StreamListener so that it only reads from the queue if it has capacity?
In addition, this error trickles up to an UndeclaredThrowableException. IO think its trying to throw the exception back to RabbitMQ so it reques the message. However the end is this:
[WARN] o.s.a.r.l.ConditionalRejectingErrorHandler
Execution of Rabbit message listener failed.
org.springframework.amqp.rabbit.listener.exception
.ListenerExecutionFailedException:
Retry Policy Exhausted
The final result is my message is lost.
Any suggestions for this second issue?
Did you try CallerRunsPolicy for your ThreadPoolTaskExecutor? This way the task won't finish with error and the thread from the SimpleMessageListenerContainer will be busy to do the latest task for just arrived message. As far as you don't use maxConcurrentConsumers option not new concurrent listeners will be raised and the current one (concurrentConsumers = 1 by default) will be busy and no new message is pulled from the Rabbit MQ.
See more info about listener container concurrency in the Docs. This way you may even reconsider your custom ThreadPoolTaskExecutor solution and will fully rely on the built-in mechanism in the Framework.
The maxConcurrency option is exposed for the RabbitMQ Binder Consumer as well.