Trigger CloudWatch event on DynamoDB epoch field - aws-lambda

I'm looking for a way to trigger an event o AWS, on a specified time stored in DynamoDB. I was thinking whether it's possible. There is a TTL feature for DynamoDB which can trigger an event, but that results in data removal.
So right now I can use DynamoDB and remove data ( I want to preserve it)
or create CloudWatch event on DynamoDB update.
I don't like any of those, appreciate any input

Related

How to place dynamo db records in a SQS queue to trigger lambda

I have a lambda that scans through items present in a dynamo table and does some post processing with that. While this works fine due to smaller number of entries in the table, it will soon grow and the 15 minute timeout will be reached.
I am considering utilizing a SQS but not sure how i can place records from the table to SQS which will then trigger the lambda concurrently.
Is this a feasible solution? Or should i just create threads with the lambda and process it, again unsure if this will count towards the 15 minute limit
Any suggestions will be appreciated, thanks
DynamoDB streams is a perfect use-case for this, every item added or modified will enter the stream and in turn will trigger your Lambda function that does the pre-processing, but of course it strongly relies on your particular use-case.
If for example you require all the data from the table, you can make useful aggregations and contain those aggregates in a single item. Then instead of having to Scan the table to get all the items, you just do a single GetItem request which already holds your aggregate data.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
As #LeeHannigan says, use DynamoDB Streams to capture your table's CRUD events. Streams has traditionally had 2 targets to consume these change events: Lambda and Kinesis.
But what about a SQS destination? EventBridge Pipes adds EventBridge as another way to consume DynamoDB streams. EB Pipes, a new integration service, would have the DynamoDB Stream as its source and SQS as its target.
The flow would be DynamoDB Streams -> EB Pipes -> SQS -> Lambda.

AWS lambda or AWS batch to delete rows from dynamo DB based on certain criteria

I have a requirement where I need to delete records based on criteria it doesn't need to be real time, the records are stored in dynamo DB. Can AWS lambda be scheduled ex run every day # 11pm can I package a cron job or is it better to go with AWS batch only worry with batch is it has more overheads.
Thanks
If your criteria is time then the best option is to use the Time To Live (TTL).
DynamoDB TTL feature delete a records after expiry.
TTL is useful if you store items that lose relevance after a specific
time. The following are example TTL use cases:
Remove user or sensor data after one year of inactivity in an application.
Archive expired items to an Amazon S3 data lake via Amazon DynamoDB Streams and AWS Lambda.
Retain sensitive data for a certain amount of time according to contractual or regulatory obligations.
AWS provides details in this document.
If instead you have more complex criteria, you could use Lambda as you requested, however creating a Lambda function alone won't be enough.
In fact Lambda need always something to kick it off, like a cron job.
AWS recommends to use CloudWatch to schedule Lambda runs. Relevant documentation can be found in the tutorial "Schedule AWS Lambda Functions Using CloudWatch Events"
This process includes the following steps:
Create an AWS Lambda Function to log the scheduled events. Specify this function when you create your rule.
Create a rule in CloudWatch to run your Lambda function on a schedule.
A rule would look like:
aws events put-rule
--name my-scheduled-rule
--schedule-expression 'rate(5 minutes)'
Verify the rule checking that your Lambda function was invoked.

Redrive data in Kinesis stream to Lambda function

I have a very simple Lambda right now that is triggered by Kinesis and it's all hooked up and working fine...but I want to work through the case where I found a bug in my lambda code and need to re-run data that is still available in the stream (my stream is setup to retain data for 7 days).
Is there an easy way to do this? I was hoping there would be something in the console to "reset" the sequence position for the lambda but I couldn't find that.
One method I've tested is to delete the original trigger and add a new one with the position as TRIM_HORIZON but wondering if there's an easier way to do this (my original trigger was setup w/ LATEST).
If you have to reprocess the whole data from Kinesis stream, there is no other way except recreating the Trigger.
STARTING_POSITION can't be updated for the existing triggers. Only certain properties are allowed to be updated using UpdateEventSourceMapping command.
Updating STARTING_POSITION in EventSourceMapping will impact the committed checkpoints in Kinesis as it determines how many records are processed and where the current position is committed. Whenever a new trigger is created, It starts checkpointing the records processing from Kinesis.

DynamoDB stream some items not sent to lambda

I have a DynamoDB stream configured to trigger a lambda. Most inserts to the table do go to the lambda, but I have a number of them that seem to never go to the lambda. At the top of my lambda, I log the received event, which includes the keys of the items it received. I am searching for those keys in my CloudWatch logs for the lambda, and they never appear. Is there some way to debug why DynamoDB may not be sending these items, or is it possible that the lambda is polling incorrectly?

Export existing DynamoDB items to Lambda Function

Is there any AWS managed solution which would allow be to perform what is essentially a data migration using DynamoDB as the source and a Lambda function as the sink?
I’m setting up a Lambda to process DynamoDB streams, and I’d like to be able to use that same Lambda to process all the existing items as well rather than having to rewrite the same logic in a Spark or Hive job for AWS Glue, Data Pipeline, or Batch. (I’m okay with the input to the Lambda being different than a DynamoDB stream record—I can handle that in my Lambda—I’m just trying to avoid re-implementing my business logic elsewhere.)
I know that I could build my own setup to run a full table scan, but I’m also trying to avoid any undifferentiated heavy lifting.
Edit: One possibility is to update all of the items in DynamoDB so that it triggers a DynamoDB Stream event. However, my question still remains—is there an AWS managed service that can do this for me?
You can create a new kinesis data stream. Add this as a trigger to your existing lambda function. Create a new simple lambda function which scans the entire table and puts records into this stream. That's it.
Your business logic stays in your original function. You are sending existing data from dynamodb to this function via kinesis.
Ref: https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/

Resources