DynamoDB stream some items not sent to lambda - aws-lambda

I have a DynamoDB stream configured to trigger a lambda. Most inserts to the table do go to the lambda, but I have a number of them that seem to never go to the lambda. At the top of my lambda, I log the received event, which includes the keys of the items it received. I am searching for those keys in my CloudWatch logs for the lambda, and they never appear. Is there some way to debug why DynamoDB may not be sending these items, or is it possible that the lambda is polling incorrectly?

Related

How to place dynamo db records in a SQS queue to trigger lambda

I have a lambda that scans through items present in a dynamo table and does some post processing with that. While this works fine due to smaller number of entries in the table, it will soon grow and the 15 minute timeout will be reached.
I am considering utilizing a SQS but not sure how i can place records from the table to SQS which will then trigger the lambda concurrently.
Is this a feasible solution? Or should i just create threads with the lambda and process it, again unsure if this will count towards the 15 minute limit
Any suggestions will be appreciated, thanks
DynamoDB streams is a perfect use-case for this, every item added or modified will enter the stream and in turn will trigger your Lambda function that does the pre-processing, but of course it strongly relies on your particular use-case.
If for example you require all the data from the table, you can make useful aggregations and contain those aggregates in a single item. Then instead of having to Scan the table to get all the items, you just do a single GetItem request which already holds your aggregate data.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
As #LeeHannigan says, use DynamoDB Streams to capture your table's CRUD events. Streams has traditionally had 2 targets to consume these change events: Lambda and Kinesis.
But what about a SQS destination? EventBridge Pipes adds EventBridge as another way to consume DynamoDB streams. EB Pipes, a new integration service, would have the DynamoDB Stream as its source and SQS as its target.
The flow would be DynamoDB Streams -> EB Pipes -> SQS -> Lambda.

AWS lambda or AWS batch to delete rows from dynamo DB based on certain criteria

I have a requirement where I need to delete records based on criteria it doesn't need to be real time, the records are stored in dynamo DB. Can AWS lambda be scheduled ex run every day # 11pm can I package a cron job or is it better to go with AWS batch only worry with batch is it has more overheads.
Thanks
If your criteria is time then the best option is to use the Time To Live (TTL).
DynamoDB TTL feature delete a records after expiry.
TTL is useful if you store items that lose relevance after a specific
time. The following are example TTL use cases:
Remove user or sensor data after one year of inactivity in an application.
Archive expired items to an Amazon S3 data lake via Amazon DynamoDB Streams and AWS Lambda.
Retain sensitive data for a certain amount of time according to contractual or regulatory obligations.
AWS provides details in this document.
If instead you have more complex criteria, you could use Lambda as you requested, however creating a Lambda function alone won't be enough.
In fact Lambda need always something to kick it off, like a cron job.
AWS recommends to use CloudWatch to schedule Lambda runs. Relevant documentation can be found in the tutorial "Schedule AWS Lambda Functions Using CloudWatch Events"
This process includes the following steps:
Create an AWS Lambda Function to log the scheduled events. Specify this function when you create your rule.
Create a rule in CloudWatch to run your Lambda function on a schedule.
A rule would look like:
aws events put-rule
--name my-scheduled-rule
--schedule-expression 'rate(5 minutes)'
Verify the rule checking that your Lambda function was invoked.

How to avoid concurrent requests to a lambda

I have a ReportGeneration lambda that takes request from client and adds following entries to a DDB table.
Customer ID <hash key>
ReportGenerationRequestID(UUID) <sort key>
ExecutionStartTime
ReportExecutionStatus < workflow status>
I have enabled DDB stream trigger on this table and a create entry in this table triggers the report generation workflow. This is a multi-step workflow that takes a while to complete.
Where ReportExecutionStatus is the status of the report processing workflow.
I am supposed to maintain the history of all report generation requests that a customer has initiated.
Now What I am trying to do is avoid concurrent processing requests by the same customer, so if a report for a customer is already getting generated don’t create another record in DDB ?
Option Considered :
query ddb for the customerID(consistent read) :
- From the list see if any entry is either InProgress or Scheduled
If not then create a new one (consistent write)
Otherwise return already existing
Issue: If customer clicks in a split second to generate report, two lambdas can be triggered, causing 2 entires in DDB and two parallel workflows can be initiated something that I don’t want.
Can someone recommend what will be the best approach to ensure that there are no concurrent executions (2 worklflows) for the same Report from same customer.
In short when one execution is in progress another one should not start.
You can use ConditionExpression to only create the entry if it doesn't already exist - if you need to check different items, than you can use DynamoDB Transactions to check if another item already exists and if not, create your item.
Those would be the ways to do it with DynamoDB, getting a higher consistency.
Another option would be to use SQS FIFO queues. You can group them by the customer ID, then you wouldn't have concurrent processing of messages for the same customer. Additionally with this SQS solution you get all the advantages of using SQS - like automated retry mechanisms or a dead letter queue.
Limiting the number of concurrent Lambda executions is not possible as far as I know. That is the whole point of AWS Lambda, to easily scale and run multiple Lambdas concurrently.
That said, there is probably a better solution for your problem using a DynamoDB feature called "Strongly Consistent Reads"
By default reads to DynamoDB (if you use the AWS SDK) are eventually consistent, causing the behaviour you observed: Two writes to the same table are made but your Lambda only was able to notice one of those writes.
If you use Strongly consistent reads, the documentation states:
When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful.
So your Lambda needs to do a strongly consistent read to your table to check if the customer already has a job running. If there is already a job running the Lambda does not create a new job.

Trigger CloudWatch event on DynamoDB epoch field

I'm looking for a way to trigger an event o AWS, on a specified time stored in DynamoDB. I was thinking whether it's possible. There is a TTL feature for DynamoDB which can trigger an event, but that results in data removal.
So right now I can use DynamoDB and remove data ( I want to preserve it)
or create CloudWatch event on DynamoDB update.
I don't like any of those, appreciate any input

Export existing DynamoDB items to Lambda Function

Is there any AWS managed solution which would allow be to perform what is essentially a data migration using DynamoDB as the source and a Lambda function as the sink?
I’m setting up a Lambda to process DynamoDB streams, and I’d like to be able to use that same Lambda to process all the existing items as well rather than having to rewrite the same logic in a Spark or Hive job for AWS Glue, Data Pipeline, or Batch. (I’m okay with the input to the Lambda being different than a DynamoDB stream record—I can handle that in my Lambda—I’m just trying to avoid re-implementing my business logic elsewhere.)
I know that I could build my own setup to run a full table scan, but I’m also trying to avoid any undifferentiated heavy lifting.
Edit: One possibility is to update all of the items in DynamoDB so that it triggers a DynamoDB Stream event. However, my question still remains—is there an AWS managed service that can do this for me?
You can create a new kinesis data stream. Add this as a trigger to your existing lambda function. Create a new simple lambda function which scans the entire table and puts records into this stream. That's it.
Your business logic stays in your original function. You are sending existing data from dynamodb to this function via kinesis.
Ref: https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/

Resources