We use spring-integration framework extensively. In one of our use case we pull large amount of data from third party api . Doing so it takes some time like 60 sec or more to get a 200 ok response and data . But in some cases the data is so large that we start getting
o.s.a.r.l.SimpleMessageListenerContainer Stopping container from aborted consumer java.lang.OutOfMemoryError: Java heap space
When this error comes the queue consumer (thread) dies and is reflected rabbit mq console. I want to figure out a way where I can catch this error log in my application so that a relevant error is raised .
#Service
public class FaiureListener implements ApplicationListener<ListenerContainerConsumerFailedEvent> {
#Autowired
HangoutAlertPoster alertSender;
#Override
public void onApplicationEvent(ListenerContainerConsumerFailedEvent event) {
alertSender.sendHangoutAlert("[FATAL] Consumer aborted error. Reason="+event.getReason());
}
}
OOM errors are generally fatal and you have to restart the JVM.
You can add an ApplicationListener or #EventListener to receive a ListenerContainerConsumerFailedEvent which contains the cause.
See https://docs.spring.io/spring-amqp/docs/2.2.8.RELEASE/reference/html/#consumer-events
If your messages are large, you should reduce the prefetch count so that fewer messages are held in memory, or consider using a DirectMessageListenerContainer instead.
See Choosing a Container.
Related
We have implemented sqslistner as the documentation suggests, the best way to receive AWS SQS message Cloud Spring Doc.
There are two ways for receiving SQS messages, either use the receive
methods of the QueueMessagingTemplate or with annotation-driven
listener endpoints. The latter is by far the more convenient way to
receive messages.
Everything is working as expected. If business process failed, we throw a runtime exception. The particular message is sent back to the SQS queue for retry. When visibility timeout passed the message reappears to the worker for processing.
Sample Code is here:
#SqsListener(value="sample-standard-queue",deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)
public void receiveMessage(String message) {
log.info("Message Received **************************** "+message );
log.info("After Conversion"+new JSONObject(message).getString("payload"));
throw new RuntimeException("An exception was thrown during the execution of the SQS listener method and Message will be still available in Queue");
}
But there are some examples where "Acknowledgment" is used instead of throwing run time exception. Documentation doesn't suggest that.
Which one is the best way to deal with a business logic failure scenario?Is Acknowledgment necessary?
Thanks in advance.
One way is to keep a track of messages being processed in some RDS table. If any message gets retried then increase the retry count in the table for that particular message.
There should be some configured numbers of retries that you want to retry one particular message and then you may want to move that to a dead-letter-queue or you may log it and just simply discard it.
There can be multiple ways of handling it: One way can be:
#SqsListener(value="sample-standard-queue",deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)
public void receiveMessage(String message) {
try{
log.info("Message Received **************************** "+message );
log.info("After Conversion"+new JSONObject(message).getString("payload"));
}catch(Exception e){
// check if its retry count has exhausted or not
// if exhausted - then acknowledge it (push it into dead-letter-queue) and dont throw the exception
// If not exhausted - increase the retry count in the table before throwing exception
throw new RuntimeException("An exception was thrown during the execution of the SQS listener method and Message will be still available in Queue");
}
}
I have jms message endpoint like:
#Bean
public JmsMessageDrivenEndpoint fsJmsMessageDrivenEndpoint(ConnectionFactory fsConnectionFactory,
Destination fsInboundDestination,
MessageConverter fsMessageConverter) {
return Jms.messageDrivenChannelAdapter(fsConnectionFactory)
.destination(fsInboundDestination)
.jmsMessageConverter(fsMessageConverter)
.outputChannel("fsChannelRouter.input")
.errorChannel("fsErrorChannel.input")
.get();
}
So, my questions is did I get next message before current message will be processed? If it will...Did it will get all messages in mq queue until it fills up all the memory? How to avoid it?
The JmsMessageDrivenEndpoint is based on the JmsMessageListenerContainer, its threading model and MessageListener callback for pulled messages. As long as your MessageListener blocks, it doesn't go to the next message in the queue to pull. When we build an integration flow starting with JmsMessageDrivenEndpoint, it becomes as a MessageListener callback. As long as we process the message downstream in the same thread (DirectChannel by default in between endpoints), we don't pull the next message from JMS queue. If you place a QueueChannel or an ExecutorChannel in between, you shift a processing to a different thread. The current one (JMS listener) gets a control back and it is ready to pull the next message. And in this case your concern about the memory is correct. You can still use QueueChannel with limited size or your ExecutorChannel can be configured with limited thread pool.
In any way my recommendation do not do any thread shifting in the flow when you start from JMS listener container. It is better to block for the next message and let the current transaction to finish its job. So you won't lose a message when something crashes.
I'm hoping this is a simple configuration issue but I can't seem to figure out what it might be.
Set-up
Spring-Boor 2.2.2.RELEASE
cloud-starter
cloud-starter-aws
spring-jms
spring-cloud-dependencies Hoxton.SR1
amazon-sqs-java-messaging-lib 1.0.8
Problem
My application starts up fine and begins to process messages from Amazon SQS. After some amount of time I see the following warning
2020-02-01 04:16:21.482 LogLevel=WARN 1 --- [ecutor-thread14] o.s.j.l.DefaultMessageListenerContainer : Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers.
The above warning gets printed multiple times and eventually I see the following two INFO messages
2020-02-01 04:17:51.552 LogLevel=INFO 1 --- [ecutor-thread40] c.a.s.javamessaging.SQSMessageConsumer : Shutting down ConsumerPrefetch executor
2020-02-01 04:18:06.640 LogLevel=INFO 1 --- [ecutor-thread40] com.amazon.sqs.javamessaging.SQSSession : Shutting down SessionCallBackScheduler executor
The above 2 messages will display several times and at some point no more messages are consumed from SQS. I don't see any other messages in my log to indicate an issue, but I get no messages from my handlers that they are processing messages (I have 2~) and I can see the AWS SQS queue growing in the number of messages and the age.
~: This exact code was working fine when I had a single handler, this problem started when I added the second one.
Configuration/Code
The first "WARNing" I realize is caused by the currency of the ThreadPoolTaskExecutor, but I can not get a configuration which works properly. Here is my current configuration for the JMS stuff, I have tried various levels of max pool size with no real affect other than the warings start sooner or later based on the pool size
public ThreadPoolTaskExecutor asyncAppConsumerTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setThreadGroupName("asyncConsumerTaskExecutor");
taskExecutor.setThreadNamePrefix("asyncConsumerTaskExecutor-thread");
taskExecutor.setCorePoolSize(10);
// Allow the thread pool to grow up to 4 times the core size, evidently not
// having the pool be larger than the max concurrency causes the JMS queue
// to barf on itself with messages like
// "Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers"
taskExecutor.setMaxPoolSize(10 * 4);
taskExecutor.setQueueCapacity(0); // do not queue up messages
taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
taskExecutor.setAwaitTerminationSeconds(60);
return taskExecutor;
}
Here is the JMS Container Factory we create
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(SQSConnectionFactory sqsConnectionFactory, ThreadPoolTaskExecutor asyncConsumerTaskExecutor) {
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(sqsConnectionFactory);
factory.setDestinationResolver(new DynamicDestinationResolver());
// The JMS processor will start 'concurrency' number of tasks
// and supposedly will increase this to the max of '10 * 3'
factory.setConcurrency(10 + "-" + (10 * 3));
factory.setTaskExecutor(asyncConsumerTaskExecutor);
// Let the task process 100 messages, default appears to be 10
factory.setMaxMessagesPerTask(100);
// Wait up to 5 seconds for a timeout, this keeps the task around a bit longer
factory.setReceiveTimeout(5000L);
factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return factory;
}
I added the setMaxMessagesPerTask & setReceiveTimeout calls based on stuff found on the internet, the problem persists without these and at various settings (50, 2500L, 25, 1000L, etc...)
We create a default SQS connection factory
public SQSConnectionFactory sqsConnectionFactory(AmazonSQS amazonSQS) {
return new SQSConnectionFactory(new ProviderConfiguration(), amazonSQS);
}
Finally the handlers look like this
#JmsListener(destination = "consumer-event-queue")
public void receiveEvents(String message) throws IOException {
MyEventDTO myEventDTO = jsonObj.readValue(message, MyEventDTO.class);
//messageTask.process(myEventDTO);
}
#JmsListener(destination = "myalert-sqs")
public void receiveAlerts(String message) throws IOException, InterruptedException {
final MyAlertDTO myAlert = jsonObj.readValue(message, MyAlertDTO.class);
myProcessor.addAlertToQueue(myAlert);
}
You can see in the first function (receiveEvents) we just take the message from the queue and exit, we have not implemented the processing code for that.
The second function (receiveAlerts) gets the message, the myProcessor.addAlertToQueue function creates a runnable object and submits it to a threadpool to be processed at some point in the future.
The problem only started (the warning, info and failure to consume messages) only started when we added the receiveAlerts function, previously the other function was the only one present and we did not see this behavior.
More
This is part of a larger project and I am working on breaking this code out into a smaller test case to see if I can duplicate this issue. I will post a follow-up with the results.
In the Mean Time
I'm hoping this is just a config issue and someone more familiar with this can tell me what I'm doing wrong, or that someone can provide some thoughts and comments on how to correct this to work properly.
Thank you!
After fighting this one for a bit I think I finally resolved it.
The issue appears to be due to the "DefaultJmsListenerContainerFactory", this factory creates a new "DefaultJmsListenerContainer" for EACH method with a '#JmsListener' annotation. The person who originally wrote the code thought it was only called once for the application, and the created container would be re-used. So the issue was two-fold
The 'ThreadPoolTaskExecutor' attached to the factory had 40 threads, when the application had 1 '#JmsListener' method this worked fine, but when we aded a second method then each method got 10 threads (total of 20) for listening. This is fine, however; since we stated that each listener could grow up to 30 listeners we quickly ran out of threads in the pool mentioned in 1 above. This caused the "Number of scheduled consumers has dropped below concurrentConsumers limit" error
This is probably obvious given the above, but I wanted to call it out explicitly. In the Listener Factory we set the concurrency to be "10-30", however; all of the listeners have to share that pool. As such the max concurrency has to be setup so that each listeners' max value is small enough so that if each listener creates its maximum that it doesn't exceed the maximum number of threads in the pool (e.g. if we have 2 '#JmsListener' annotated methods and a pool with 40 threads, then the max value can be no more than 20).
Hopefully this might help someone else with a similar issue in the future....
I profiled my kafka producer spring boot application and found many "kafka-producer-network-thread"s running (47 in total). Which would never stop running, even when no data is sending.
My application looks a bit like this:
var kafkaSender = KafkaSender(kafkaTemplate, applicationProperties)
kafkaSender.sendToKafka(json, rs.getString("KEY"))
with the KafkaSender:
#Service
class KafkaSender(val kafkaTemplate: KafkaTemplate<String, String>, val applicationProperties: ApplicationProperties) {
#Transactional(transactionManager = "kafkaTransactionManager")
fun sendToKafka(message: String, stringKey: String) {
kafkaTemplate.executeInTransaction { kt ->
kt.send(applicationProperties.kafka.topic, System.currentTimeMillis().mod(10).toInt(), System.currentTimeMillis().rem(10).toString(),
message)
}
}
companion object {
val log = LoggerFactory.getLogger(KafkaSender::class.java)!!
}
}
Since each time I want to send a message to Kafka I instantiate a new KafkaSender, I thought a new thread would be created which then sends the message to the kafka queue.
Currently it looks like a pool of producers is generated, but never cleaned up, even when none of them has anything to do.
Is this behaviour intended?
In my opinion the behaviour should be nearly the same as datasource pooling, keep the thread alive for some time, but when there is nothing to do, clear it up.
When using transactions, the producer cache grows on demand and is not reduced.
If you are producing messages on a listener container (consumer) thread; there is a producer for each topic/partition/consumer group. This is required to solve the zombie fencing problem, so that if a rebalance occurs and the partition moves to a different instance, the transaction id will remain the same so the broker can properly handle the situation.
If you don't care about the zombie fencing problem (and you can handle duplicate deliveries), set the producerPerConsumerPartition property to false on the DefaultKafkaProducerFactory and the number of producers will be much smaller.
EDIT
Starting with version 2.8 the default EOSMode is now V2 (aka BETA); which means it is no longer necessary to have a producer per topic/partition/group - as long as the broker version is 2.5 or later.
I am performing some simple tests with ActiveMQ to see how it performs on a non stable network. The first test consists in a producer that sends messages to a remote queue. The message is of type ObjectMessage with serializable content inside (a list of Objects).
With a good network everything works correctly, but when I launch the same tests using netem to simulate packages losses, delays and corruptions I get the following error when consuming the messages when trying to extract the content of the Message:
2011-03-16 11:59:21,791 ERROR [com.my.MessageConsumer] Failed to build body from bytes. Reason: java.io.StreamCorruptedException: invalid handle value: 017E0007
javax.jms.JMSException: Failed to build body from bytes. Reason: java.io.StreamCorruptedException: invalid handle value: 017E0007
So it seems like the message was corrupted while sending to the remote Queue but anyway stored, and only when is consumed the consumer see that the message is corrupted.
After this I will use a local Queue and a Network Connector to forward the messages to the remote Queue, and that I hope it solve the problem, but I was surprised that there was not any kind of validation between the producer and the destination (at least a checksum or something like that) that guarantees a correct delivery, am I doing something wrong or is the normal behaviour?
I don't have the code here right now, but it was super simple, just a MessageListener:
public class myMessageConsumer implements MessageListener{
public void onMessage(Message message){
try
{
if (message instanceof ObjectMessage){
ObjectMessage myMessage = (ObjectMessage) message;
List dtoList = (List) myMessage.getObject();
}
} catch(Exception ex){
ex.printStackTrace();
}
}
}
If the exact code is needed I'll put it when I go back from holidays, but it was exactly like that.
The broker isn't going to validate the contents of each and every message that it processes, that would be a tremendous waste of time and slow down message dispatch significantly. The client received a bad message and threw a JMSException to indicate that the message contents were corrupted which should be sufficient for your app to respond correctly.
Where's your code?
If that exception comes from your code, seems like it's possible that you've got a bug. For example, getting some JMS error receiving the message but messing up error handling and trying to process the results anyway. For a test like you describe, you'd need a good focus on error handling in your clients.
I don't have experience w/ ActiveMQ, but it seems very surprising that it'd allow corrupt message delivery. Not that I'm wanting the JMS implementation to unpack the ObjectMessage to check. Just that it should deliver a byte-for-byte uncorrupted copy of what was sent. Or error out if it can't.