How to reprocess failed events from Kinesis? - aws-lambda

I have a lambda producer which putRecords to a kinesis steam. Sometimes while writing to kinesis I get Internal service failure. What is the best way to handle such cases where lambda fails to write to kinesis ? I have a retry mechanism on my producer lambda but even after retry attempts it fails to write in some cases.

A good approach could be to use the DeadletterQueue with Lambda functions. You can configure SNS/SQS to create a queue and write all the failed events there and retry to process.
https://aws.amazon.com/about-aws/whats-new/2016/12/aws-lambda-supports-dead-letter-queues/

Related

AWS Cloudwatch Subscription Filter and Dead Letter Queue for Lambda

I am using CloudWatch log subscription filters to get logs from a specific log group and send them to a Lambda function, which after processing will send the results to another service. I wonder if there's any possibility to send failed events by Lambda to a Dead Letter Queue, noting that in the above settings, we have no SNS/SQS setup to trigger the Lambda.
Destinations gives you the ability to handle the Failure of function
invocations along with their Success. When a function invocation
fails, such as when retries are exhausted or the event age has been
exceeded (hitting its TTL), Destinations routes the record to the
destination resource for every failed invocation for further
investigation or processing.
To configure destination in Lambda function, Kindly refer

Multiple processes on Single SQS with in a Lambda function

Currently , we have a sqs subscribed to a Standard SNS topic which triggers a lambda to publish some data based upon these events to a downstream X .
We have come with another usecase where we want to listen to this exiting SNS and publish another set of data based on these events to downstream Y . In future we might have another use case where we want to listen to this exiting SNS and publish another set of data based on these events to downstream Z .
I was wondering if we can re use this existing SQS and lambda for these new use case . I am just curious how wan we handle failure scenarios in case one of publish fails . Failure of 1 process out of x will lead the message back to DLQ from where the re drive would be required , so all the consumer processes of this message with in the lambda will have again process this redrived message ?
Another way could be have a separate SQS and separate lambda for each of such use cases .
has someone had a similar problem statement and what was the approach followed out of the above two or anything that could help reusing some of the existing infra ?
You should subscribe multiple Amazon SQS queues to the Amazon SNS topic, so that each of them receives a copy of the message. Each SQS queue can have its own Dead Letter Queue for error handling.
It is also possible to subscribe the AWS Lambda function to the Amazon SNS topic directly, without using an Amazon SQS queue. The Lambda function can send failed messages directly to a Dead Letter Queue.

DLQ redrive failed events back to DynamoDB streams?

I have a DynamoDB stream triggering a Lambda, and I want to push any failed events to a DLQ.
If the source of a DLQ is an SQS queue, it looks like you can do something called a redrive back to the source queue, where messages in DLQ will be moved back to the source queue.
I am guessing that this isn't possible with if the source is a DynamoDB stream?
AWS doesn't provide any mechanism as of now to replay failed dynamo DB streams from a DLQ. The messages in the DLQ will have the metadata of the event rather than the actual failed records.
In case there is a need to replay the failed dynamo DB streams, it can be done in two step approach.
Get the shard iterator from the event metadata
Using the shard iterator, get the actual failed records from the Dynamo DB and process accordingly
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_GetShardIterator.html

AWS Event Bridge Lambda invocation

I have configured a lambda function as EventBridge rule target and I have configured a Dead Letter Queue on the EventBridge rule to capture exceptions.
Now, if the lambda function fails, Event Bridge does not recognize that failure as an error.
Since the EventBridge invocation to the Lambda is asynchronous, for EventBridge it is enough to reach the lambda to consider the event as successfull,but in this way I am not able to track and retry events once lambda fails.
Anyone know a way to make the EventBridge to the Lamdba request synchronous or another way to be able to Retry the events if the Lambda code fails after the invocation?
one option is to make the SQS to be rule target and use SQS event to trigger the lambda. when failure occurs, the lambda won't flag the event done so as to keep the event in the SQS. retry will auto happen after a configured period (SQS configuration). also you can configure dead letter queue after the retention time expires
EventBridge guarantees the delivery of the event to the lambda function but is not aware of what happens post that. It's lambda invocation vs lambda execution. Eventbridge successfully delivered the message to the lambda service, so it's a successful invocation.
For Lambda, EventBridge calls invokeAsync API. So, if this API sends a success response, EventBridge will assume the delivery was successful. Any failures within Lambda from async to sync is not visible to EventBridge. We should configure retries and create DLQs in our Lambda functions to make sure the events are not lost in case the Lambda function fails to execute. We could in fact configure the same DLQ used by the EventBridge to be used by the Lambda as well so that all eventual failures land in a single place.
AWS has a dedicated documentation page for this, which states the following for asynchronous invocation:
Lambda retries function errors twice. If the function doesn't have enough capacity to handle all incoming requests, events might wait in the queue for hours or days to be sent to the function. You can configure a dead-letter queue on the function to capture events that weren't successfully processed. For more information, see Asynchronous invocation.
So that means that as long as your Lambda functions handler function does return an error, the AWS Lambda service should retry to run your Lambda again.
Therefore, you might not need EventBridge to retry your event.
See: Error handling and automatic retries in AWS Lambda

Lambda function does not retry the sqs message processing in case the Java code throws a runtimeException

I have a lambda function written in java that listens to sqs events and tries to do some kind of processing on those sqs messages.
As per lambda documentation, if lambda code throws a runtimeException then, lambda would retry the same message twice before it sends it back to the queue. However, I don't see that behavior.
I only see it processing the message just once.
Here is a snippet from the relevant lambda code in this case.
#Override
public Boolean handleRequest(SQSEvent sqsEvent, Context context) {
try{
........some queue message processing.....
}
catch(Exception ex){
throw new RuntimeException("exception occurred");
}
}
Is this not good enough for lambda to retry the message 2 more times? I did check the cloudwatch to see what lambda logs and it just has logs from the very first processing only and not the retries.
Can someone tell me what did I miss here because of which it does not work as expected.
You are missing the throws clause in handleRequest. If you don't have that then, lambda would just swallow the exception
public Boolean handleRequest(SQSEvent sqsEvent, Context context) throws RuntimeException
Other than that, what Thales Munussi has told you about synchronous polling is absolutely right. When you hook sqs with lambda, lambda polls sqs which keeps an open connection between the two hence making it a synchronous connection
As per aws documentation, lambda doesn’t retry in such synchronous cases. Setting up a dlq and retires in sqs itself is your best recourse
Keep in mind that lambda would send the message back to the queue after a runtime exception is thrown in your java code
Based on the redrive setting in your sqs, the sqs will generate the same event based on redrive number.
Once lambda fails to process successfully for redrive number of times, message is sent to the DLQ from the main queue
The documentation says it retries two more times if the invocation is asynchronous. SQS is a poll-based system. Lambda will poll the Queue and all of its invocations will be synchronous.
For poll-based AWS services (Amazon Kinesis, Amazon DynamoDB, Amazon
Simple Queue Service), AWS Lambda polls the stream or message queue
and invokes your Lambda function synchronously.
What you can do is configure a DLQ on your source SQS queue in case your message fails so you can either analyse it further or process the message again based on the logic you have configured.
EDIT
The OP is not able to see the messages in the DLQ for somehow. I have attached images to show it works.
Lambda sqs-test is triggered by a new message in SQS queue sqs-test
These are the Queues (sqs-test-dlq is configured as a DLQ for sqs-test).
This is the code for the Lambda function:
This is the configuration for sqs-test
And this is the redrive policy
After the messages failed in the Lambda function, they were successfully sent to the configured DLQ:
You must be missing some basic configuration because it works seamlessly as the images above show.

Resources