Lambda retry: missing logs in CloudWatch - aws-lambda

I have configured an AWS Lambda such that it will be triggered if a SQS queue receives a message. If any exception occurs while processing the message, the lambda will retry 1 time and if it fails again then the message will go to DLQ.
The configuration is:
on the queue: maxReceiveCount = 2 on the queue. redrive to the dlq. VisibilityTimeout = 30 seconds
on my lambda: ReservedConcurrentExecutions = 1, BatchSize = 1, timeout of 10 seconds
link to the sam template
If i make the lambda throw an error like that:
const receiveHandler = async (event) => {
throw new Error('Booooooom')
};
Then I send a batch of 3 messages.:
I see the failed invocations on CloudWatch
then I see 1 retry for each message on CloudWatch
then I receive the 3 messages in the dead-letter queue. Each message has a receiveCount of 3
Conclusion: everything works as expected
So I do another test. I change the code of my lambda, so it will timeout:
function delay(milliseconds) {
return new Promise(resolve => {
setTimeout(() => { resolve() }, milliseconds);
})
}
const receiveHandler = async (event) => {
// wait 20 seconds
await delay(20000);
};
exports.receiveHandler = receiveHandler;
I purge the dlq, I deploy the new stack, and I send a new batch of 3 messages.
What happens is:
I see in Cloudwatch my message 2 start being processed (1st attempt)
then i see my message 1 start being processed (1st attempt)
then i see my message 1 being processed again (retry)
then i see my message 3 start being processed (1st attempt)
All processings ended in timeout as expected
So 2 logs are missing:
retry for message 2
retry for message 3
However, when I poll the dlq, what I see is:
3 messages in the dlq
ReceiveCount = 3 for each message
So, if I believe the count in the DLQ, all my messages have been retried once. What could be the reason why CloudWatch is missing 2 messages ?
Edit: I did 2 successful tests:
if I triple the VisibilityTimeout and I redo the test, I see all my logs
if, instead, I triple the ReservedConcurrency, I also see all my logs
So, i think that with the configuration I set, some of my messages have no room to be retried, they are moved to the DLQ instead.
Thank you for your help guys !

Related

Combine Retry, Curcuit breaker and delayed scheduled delivery

I am struggling hard to properly test this scenario and could not really match the numbers. Could you please verify whether this configuration is correct for the below scenario?
When the message comes to consumer first time, want to retry for these exceptions WebException, HttpRequestException, RequestTimeoutException, TimeoutException. And after this retries exhausted I want to redeliver these messages( only for the above exceptions) using delayed exchange with intervals first time delay 2 minutes, then 4 minutes and finally 6 minutes and after 3 times stop redeliver and push to _error queue.
I want UseMessageRetry() should execute only first time and not every time when the message get to consumer through delayed exchange.
cfg.UseDelayedExchangeMessageScheduler();
cfg.ReceiveEndpoint(rabbitMqConfig.QueueName, e =>
{
e.PrefetchCount = 20;
e.UseRateLimit(100, TimeSpan.FromMinutes(3));
e.UseDelayedRedelivery(p =>
{
p.Intervals(TimeSpan.FromMinutes(2), TimeSpan.FromMinutes(4),TimeSpan.FromMinutes(6));
});
e.UseCircuitBreaker(cb =>
{
cb.TrackingPeriod = TimeSpan.FromMinutes(1);
cb.TripThreshold = 15;
cb.ActiveThreshold = 10;
cb.ResetInterval = TimeSpan.FromMinutes(5);
});
e.UseMessageRetry(r =>
{
r.Incremental(2, TimeSpan.FromSeconds(3), TimeSpan.FromSeconds(6));
r.Handle<WebException>();
r.Handle<HttpRequestException>();
r.Handle<TimeoutException>();
r.Handle<RequestTimeoutException>();
});
e.Consumer<Consumers.ProductConsumer>(provider);
});

Is There way to find the queue is empty using rabbit-template

I have subscriber which collects the messages until reaches the specified limit and then pass collected messages to the processor to perform some operations. Code works fine, problem is subscriber waits Until it collects specified number messages. If we have lesser message program control will not pass to processor.
For example Lets say my chunk size is 100 and if I have 100 or multiple of 100 messages then program works fine But if I have messages < 100 or 150 some of messages are read by subscriber but they were never passed to processor. Is there way I can figure-out is that Queue is empty using rabbit template so that I can check that condition and break the loop
#RabbitListener(id="messageListener",queues = "#{rabbitMqConfig.getSubscriberQueueName()}",containerFactory="queueListenerContainer")
public void receiveMessage(Message message, Channel channel, #Header("id") String messageId,
#Header("amqp_deliveryTag") Long deliveryTag) {
LOGGER.info(" Message:"+ message.toString());
if(messageList.size() < appConfig.getSubscriberChunkSize() ) {
messageList.add(message);
deliveryTagList.add(deliveryTag);
if(messageList.size() == appConfig.getSubscriberChunkSize()) {
LOGGER.info("------------- Calling Message processor --------------");
Message [] messageArry = new Message[messageList.size()];
messageArry = messageList.toArray(messageArry);
LOGGER.info("message Array Length: "+messageArry.length);
messageProcessor.process(messageArry);
messageList = new ArrayList<Message>(Arrays.asList(messageArry));
LOGGER.info("message Array to List conversion Size: "+messageList.size());
LOGGER.info("-------------- Completed Message processor -----------");
eppQ2Publisher.sendMessages(messageList, channel, deliveryTagList);
messageList.clear();
deliveryTagList.clear();
}
} else {
// do nothing..
}
There are two ways to achieve this.
Add an #EventListener to listen for ListenerContainerIdleEvents which are published when no messages have been received for some time; set the container's idleEventInterval property. The source of the event is the listener container; it contains the #RabbitListener's id. See Detecting Idle Consumers.
Use RabbitAdmin.getQueueProperties().
You can use RabbitAdmin.getQueueInfo("queue name").getMessageCount() that will be 0 for empty queue.

Issue or confusion with JMS/spring/AMQ not processing messages asynchronously

We have a situation where we set up a component to run batch jobs using spring batch remotely. We send a JMS message with the job xml path, name, parameters, etc. and we wait on the calling batch client for a response from the server.
The server reads the queue and calls the appropriate method to run the job and return the result, which our messaging framework does by:
this.jmsTemplate.send(queueName, messageCreator);
this.LOGGER.debug("Message sent to '" + queueName + "'");
try {
final Destination replyTo = messageCreator.getReplyTo();
final String correlationId = messageCreator.getMessageId();
this.LOGGER.debug("Waiting for the response '" + correlationId + "' back on '" + replyTo + "' ...");
final BytesMessage message = (BytesMessage) this.jmsTemplate.receiveSelected(replyTo, "JMSCorrelationID='"
+ correlationId + "'");
this.LOGGER.debug("Response received");
Ideally, we want to be able to call out runJobSync method twice, and have two jobs simultaneously operate. We have a unit test that does something similar, without jobs. I realize this code isn't very great, but, here it is:
final List result = Collections.synchronizedList(new ArrayList());
Thread thread1 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(1000);
result.add(Thread.currentThread().getName());
}
}, "thread1");
Thread thread2 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(500);
result.add(Thread.currentThread().getName());
}
}, "thread2");
thread1.start();
Thread.sleep(250);
thread2.start();
thread1.join();
thread2.join();
Assert.assertEquals("both thread finished", 2, result.size());
Assert.assertEquals("thread2 finished first", "thread2", result.get(0));
Assert.assertEquals("thread1 finished second", "thread1", result.get(1));
When we run that test, thread 2 completes first since it just has a 500 millisencond wait, while thread 1 does a 1 second wait:
Thread.sleep(delayInMs);
return result;
That works great.
When we run two remote jobs in the wild, one which takes about 50 seconds to complete and one which is designed to fail immediately and return, this does not happen.
Start the 50 second job, then immediately start the instant fail job. The client prints that we sent a message requesting that the job run, the server prints that it received the 50 second request, but waits until that 50 second job is completed before handling the second message at all, even though we use the ThreadPoolExecutor.
We are running transactional with Auto acknowledge.
Doing some remote debugging, the Consumer from AbstractPollingMessageListenerContainer shows no unhandled messages (so consumer.receive() obviously just returns null over and over). The webgui for the amq broker shows 2 enqueues, 1 deque, 1 dispatched, and 1 in the dispatched queue. This suggests to me that something is preventing AMQ from letting the consumer "have" the second message. (prefetch is 1000 btw)
This shows as the only consumer for the particular queue.
Myself and a few other developers have poked around for the last few days and are pretty much getting nowhere. Any suggestions on either, what we have misconfigured if this is expected behavior, or, what would be broken here.
Does the method that is being remotely called matter at all? Currently the job handler method uses an executor to run the job in a different thread and does a future.get() (the extra thread is for reasons related to logging).
Any help is greatly appreciated
not sure I follow completely, but off the top, you should try the following...
set the concurrentConsumers/maxConcurrentConsumers greater than the default (1) on the MessageListenerContainer
set the prefetch to 0 to better promote balancing messages between consumers, etc.

Specifying timeout for reading messages from activemq queue using camel

I am using camel to read messages from an activemq queue, process it and post it to another queue. The route looks as follows:
from("jms:incoming.queue")
.process(new MyProcessor())
.to("jms:outgoing.queue");
I need to specify a timeout such that if there are no messages in "incoming.queue" for more than 3 minutes, I would like to stop the route. I can use OnCompletion() but it gets called after each message. I can specify timeout for sending message to "outgoing.queue". Is there a way I can specify a timeout such that if there are no message for more than 3 minutes in the "incoming.queue", I can stop the route?
Thanks in advance for you help.
two options I can think of...
use a CronScheduledRoutePolicy to start/stop your route automatically at specified times...
CronScheduledRoutePolicy myPolicy = new CronScheduledRoutePolicy();
myPolicy.setRouteStartTime("0 20 * * * ?");
myPolicy.setRouteStopTime("0 0 * * * ?");
from("jms:incoming.queue")
.routePolicy(myPolicy).noAutoStartup()
.process(new MyProcessor())
.to("jms:outgoing.queue");
use a camel-quartz route and a polling consumer to drain the queue on a schedule
MyCoolBean cool = new MyCoolBean();
cool.setProducer(template);
cool.setConsumer(consumer);
from("quartz://myGroup/myTimerName?cron=0+20+*+*+*+?")
.bean(cool);
//MyCoolBean snippet
while (true) {
// receive the message from the queue, wait at most 60s
Object msg = consumer.receiveBody("jms:incoming.queue", 60000);
if (msg == null) {
break;
}
producer.sendBody("jms:outgoing.queue", msg);
}
Based on your comment above it appears you are just looking to start and stop the route on a schedule. You can use a quartz job to call the start and stop methods on your jms route. You could even make the quartz logic a route as well using the quartz endpoint if you like.

Azure Worker: Read a message from the Azure queue in a mutex way

The run method of my worker role is:
public override void Run()
{
Message msg=null;
while (true)
{
msg = queue.GetMessage();
if(msg!=null && msg.DequeueCount==1){
//delete message
...
//execute operations
...
}
else if(msg!=null && msg.DequeueCount>1){
//delete message
...
}
else{
int randomTime = ...
Thread.Sleep(randomTime);
}
}
}
For performance tests I would that a message could be analysed only by a worker (I don't consider failure problems on workers).
But seems by my tests, that two workers can pick up the same message and read DequeueCount equals to 1 (both workers). Is it possible?
Does exist a way that allow just a worker to read a message in a "mutex" way?
How is your "getAMessage(queue)" method defined? If you do PeekMessage(), a message will be visible by all workers. If you do GetMessage(), the message will be got only by the worker which firsts get it. But for the invisibility timeout either specified or the default (30 sec.). You have to delete the message before the invisibility timeout comes.
Check out the Queue Service API for more information. I am sure that there is something wrong in your code. I use queues and they behave as by documentation in dev storage and in production storage. You may want to explicitly put higher value of the Visibility Timeout when you do GetMessage. And make sure you do not sleep longer than the visibility timeout.

Resources