API waiting for a specific record on DynamoDb without pooling - events

I am inheriting a workflow that has a reasonable amount of data stored in DynamoDb. The data is periodically refreshed by Lambdas calling third parties when needed. The lambdas are triggered by both SQS and DynamoDB streams and go through four or five steps before the data is updated.
I'm given the task to write an API that can forcibly update N items and return their status. The obvious way to do this without reinventing the wheel and honoring DRY is to trigger an event that spawns off a refresh for each item so that the lambdas can do their thing.
The trouble is that I'm not sure the best pub/sub approach to handle being notified that end state of each workflow is met. Do I read from an update/insert stream of dynamodb to see if the records are updated? Do I create some sort of pub/sub model like Reddis or SNS to listen for the end state of each lambda being triggered?
Since I'm writing a REST API, timeouts, if there are failures along the line, arefine. But at the same time I want to make sure I can handle the following.
Be guaranteed that I can be notified that an update occurred for my targets after my call (in the case of multiple forced updates being called at once I only care about the first one to arrive).
Not be bogged down by listening for updates for record updates that are not contextually relevant to the API call in question.
Have an amortized time complexity of 1
In other words, in terms of cap theory i care about C & A but not P (because a 502 isn't that big a deal). But getting the timing wrong or missing a subscription is a problem.
I know I can just listen to a dynamodb event stream but I'm concerned that when things get noisy there will be more irrelevant stuff slowing me down. And I'm not sure if having every single record getting it's own topic is scalable (or how messy that would be).

You can use DynamoDB streams in combination with Lambda Event Filtering so the Lambda function only executes for the relevant change you are interested in. More information is available here:
https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-event-filtering-amazon-sqs-dynamodb-kinesis-sources/

Related

DynamoDb re-processing records

I just inherited some one else's code that uses a server-less lambda function to process records from DynamoDb. The original developer is using DynamoDb much like how RabbitMQ works; as a temporary staging area with some level of fault tolerance and a lambda function that will process them at a later date.
We currently have a way to delay message publication in RabbitMQ at my company, but this feature is missing on the AWS side of the fence.
I wrote some code in my serverless lambda function so that it checks a special property called ProcessAfter (UTC DateTime) and effectively skips processing any given DynamoDb record if the current UTC date/time is less than that specified by the ProcessAfter. However DynamoDb never sends me that record ever again. It appears that DynamoDb only ever allows a single attempt at processing a record (excluding the exception re-tries built in), so I'm stuck with my attempted solution to implementing a delay capability.
Is there anyway to replicate the delay functionality in DynamoDb, or in my lambda function so that messages are skipped, and then re-processed as often as necessary until the delay is over and the record is successfully processed?
Looks like you are listening to dynamo_db streams. They work in a way if any event(insert, update etc which is being configured) happens for a record it will be sent to a listener for processing.
Now talking about your specific scenario, you need to have an SQS in place for processing a record later if you do not wish to process it after listening.
Better architecture I would advice is put an extra SQS and Lambda. The Lambda will listen the dynamo_db stream event, will compare processAfter with Date_Now to compute delay, add that delay as delay_seconds and send message to SQS.
Finally lambda listener will listen and process it after specified delay or 0 delay as required.

How to use Event-Driven architecture to remove "api-based lambda calling another lambda" anti-pattern?

Suppose, I have an api POST /order which invokes PlaceOrder lambda and expects response from this. PlaceOrder lambda does some works, invokes another lambda ProcessPayment lambda and expects response. Also, ProcessPayment invokes CreateInvoice lambda expecting response. Whole architecture is like a RequestResponse cycle. I woulde like to achieve that without lambda invoking another lambda as it is considered as anti-pattern. My question is what is the best design pattern to achieve this behavior within 29 seconds with event-driven architecture.
What AWS suggests: As per this official documentation, they suggests to use SQS. But regarding using SQS, I have some thoughts.
My thoughts:
At event sources architecture, I can orchestrate these lambdas with SQS, SNS etc other event sources, but in that case, the nature would not be synchronous and thus I would not get response from the api.
My other solution:
Using Step Function: I can orchestrate this workflow with step function, and I think it is more elegant solution in this synchronous calling case. But I would like to achieve
this via event sources.
How can I design this scenerio with best practices using event-based achitecture?
In an Event-Driven Architecture, the communication between producers and consumers is asynchronous by design, that's the way the architecture scales.
You can get nearly synchronous communication between 2 services in an EDA, by creating dedicated queues / channels to communicate between them, make sure they're scaled up to a level where the latency is acceptable (close to synchronous values).
This adds some complexity, because the services which need responses, have to wait in a hot-loop to get them as soon as possible, and also if messages are lost, you need to have retry policies, etc.
I think you need to focus more on the mechanics of your program and a bit less on design patterns. You need to use the design patterns that fit your use-case, the other way around will not work. In the end, you build a program to fulfill a certain task or set of tasks, so that should be your end goal.
You’re stating that you have a process order Lambda, a create invoice Lambda and a process payment Lambda. I’d say the most interesting question is what you need to get done before you return a response to the user. Maybe you can process the order, respond to the user that it is done and handle the invoicing and payments on a later moment. Typically that would mean you put a message in a SQS queue or on an SNS topic.
It could be that you need your payment to be processed before you respond to the user, because they need to be informed about the status of the payment. You could then combine both actions in a single Lambda, because there is no way to spit the two tasks from one another. Keep in mind that often another option exist where you process the order first, put a message in a queue for the process payment (as it typically is a process that involves a third party) and the front end will poll for an update on the payment status. This way you can return a response quickly and still give an update on the payment as soon as possible.
The create invoice process is typically something you would never want to synchronously invoke during order confirmation. What if your invoicing application (intern or extern) is down? Theoretically you could still process orders as long as you create the invoice at some later moment in time. If you couple everything together you make order confirmation dependent on your invoice creation process, which I would regard as an unnecessary dependency.
I would really advice against step functions for this use-case. They can be utilized for long running processes that need to keep state and ‘wake up’ at specific moments, but for this specific flow I would say they do not help and are unnecessarily complex. If you have 3 things you need to do that you cannot separate from
one another, just run them in the same Lambda.

Invoking 1 AWS Lambda with API Gateway sequentially

I know there's a question with the same title but my question is a little different: I got a Lambda API - saveInputAPI() to save the value into a specified field. Users can invoke this API with different parameter, for example:
saveInput({"adressType",1}); //adressType is a DB field.
or
saveInput({"name","test"}) //name is a DB field.
And of course, this hosts on AWS so I'm also using API Gateway as well. But the problem is sometimes, an error like this happened:
As you can see. API No. 19 was invoked first but ended up finishing later
(10:10:16:828) -> (10:10:18:060)
While API No.18 was invoked later but finished sooner...
(10:10:17:611) -> (10:10:17:861)
This leads to a lot of problems in my project. And sometimes, the delay between 2 API was up to 10 seconds. The front project acts independently so users don't know what happens behind. They think they have set addressType to 1 but in reality, the addressType is still 2. Since this project is large and I cannot change this kind of [using only 1 API to update DB value] design. Is there any way for me to fix this problem ?? Really appreciate any idea. Thanks
If updates to Database can't be skipped if last updated timestamp is more recent than the source event timestamp, we need to decouple Api Gateway and Lambda.
Api Gateway writes to SQS FIFO Queue.
Lambda to consume SQS and process the request.
This will ensure older event is processed first.
Amazon Lambda is asynchronous by design. That means that trying to make it synchronous and predictable is kind of waste.
If your concern is avoiding "old" data (in a sense of scheduling) overwrite "fresh" data, then you might consider timestamping each data and then applying constraints like "if you want to overwrite target data, then your source timestamp have to be in the future compared to timestamp of the targeted data"

CQRS + Microservices Handling event rollback

We are using microservices, cqrs, event store using nodejs cqrs-domain, everything works like a charm and the typical flow goes like:
REST->2. Service->3. Command validation->4. Command->5. aggregate->6. event->7. eventstore(transactional Data)->8. returns aggregate with aggregate ID-> 9. store in microservice local DB(essentially the read DB)-> 10. Publish Event to the Queue
The problem with the flow above is that since the transactional data save i.e. persistence to the event store and storage to the microservice's read data happen in a different transaction context if there is any failure at step 9 how should i handle the event which has already been propagated to the event store and the aggregate which has already been updated?
Any suggestions would be highly appreciated.
The problem with the flow above is that since the transactional data save i.e. persistence to the event store and storage to the microservice's read data happen in a different transaction context if there is any failure at step 9 how should i handle the event which has already been propagated to the event store and the aggregate which has already been updated?
You retry it later.
The "book of record" is the event store. The downstream views (the "published events", the read models) are derived from the book of record. They are typically behind the book of record in time (eventual consistency) and are not typically synchronized with each other.
So you might have, at some point in time, 105 events written to the book of record, but only 100 published to the queue, and a representation in your service database constructed from only 98.
Updating a view is typically done in one of two ways. You can, of course, start with a brand new representation and replay all of the events into it as part of each update. Alternatively, you track in the metadata of the view how far along in the event history you have already gotten, and use that information to determine where the next read of the event history begins.
Inside your event store, you could track whether read-side replication was successful.
As soon as step 9 suceeds, you can flag the event as 'replicated'.
That way, you could introduce a component watching for unreplicated events and trigger step 9. You could also track whether the replication failed multiple times.
Updating the read-side (step 9) and flagigng an event as replicated should happen consistently. You could use a saga pattern here.
I think i have now understood it to a better extent.
The Aggregate would still be created, answer is that all the validations for any type of consistency should happen before my aggregate is constructed, it is in case of a failure beyond the purview of the code that a failure exists while updating the read side DB of the microservice which needs to be handled.
So in an ideal case aggregate would be created however the event associated would remain as undispatched unless all the read dependencies are updated, if not it remains as undispatched and that can be handled seperately.
The Event Store will still have all the event and the eventual consistency this way is maintained as is.

Check if S3 file has been modified

How can I use a shell script check if an Amazon S3 file ( small .xml file) has been modified. I'm currently using curl to check every 10 seconds, but it's making many GET requests.
curl "s3.aws.amazon.com/bucket/file.xml"
if cmp "file.xml" "current.xml"
then
echo "no change"
else
echo "file changed"
cp "file.xml" "current.xml"
fi
sleep(10s)
Is there a better way to check every 10 seconds that reduces the number of GET requests? (This is built on top of a rails app so i could possibly build a handler in rails?)
Let me start by first telling you some facts about S3. You might know this, but in case you don't, you might see that your current code could have some "unexpected" behavior.
S3 and "Eventual Consistency"
S3 provides "eventual consistency" for overwritten objects. From the S3 FAQ, you have:
Q: What data consistency model does Amazon S3 employ?
Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
Eventual consistency for overwrites means that, whenever an object is updated (ie, whenever your small XML file is overwritten), clients retrieving the file MAY see the new version, or they MAY see the old version. For how long? For an unspecified amount of time. It typically achieves consistency in much less than 10 seconds, but you have to assume that it will, eventually, take more than 10 seconds to achieve consistency. More interestingly (sadly?), even after a successful retrieval of the new version, clients MAY still receive the older version later.
One thing that you can be assured of is: if a client starts download a version of the file, it will download that entire version (in other words, there's no chance that you would receive for example, the first half of the XML file as the old version and the second half as the new version).
With that in mind, notice that your script could fail to identify the change within your 10-second timeframe: you could make multiple requests, even after a change, until your script downloads a changed version. And even then, after you detect the change, it is (unfortunately) entirely possible the the next request would download the previous (!) version, and trigger yet another "change" in your code, then the next would give the current version, and trigger yet another "change" in your code!
If you are OK with the fact that S3 provides eventual consistency, there's a way you could possibly improve your system.
Idea 1: S3 event notifications + SNS
You mentioned that you thought about using SNS. That could definitely be an interesting approach: you could enable S3 event notifications and then get a notification through SNS whenever the file is updated.
How do you get the notification? You would need to create a subscription, and here you have a few options.
Idea 1.1: S3 event notifications + SNS + a "web app"
If you have a "web application", ie, anything running in a publicly accessible HTTP endpoint, you could create an HTTP subscriber, so SNS will call your server with the notification whenever it happens. This might or might not be possible or desirable in your scenario
Idea 2: S3 event notifications + SQS
You could create a message queue in SQS and have S3 deliver the notifications directly to the queue. This would also be possible as S3 event notifications + SNS + SQS, since you can add a queue as a subscriber to an SNS topic (the advantage being that, in case you need to add functionality later, you could add more queues and subscribe them to the same topic, therefore getting "multiple copies" of the notification).
To retrieve the notification you'd make a call to SQS. You'd still have to poll - ie, have a loop and call GET on SQS (which cost about the same, or maybe a tiny bit more depending on the region, than S3 GETs). The slight difference is that you could reduce a bit the number of total requests -- SQS supports long-polling requests of up to 20 seconds: you make the GET call on SQS and, if there are no messages, SQS holds the request for up to 20 seconds, returning immediately if a message arrives, or returning an empty response if no messages are available within those 20 seconds. So, you would send only 1 GET every 20 seconds, to get faster notifications than you currently have. You could potentially halve the number of GETs you make (once every 10s to S3 vs once every 20s to SQS).
Also - you could chose to use one single SQS queue to aggregate all changes to all XML files, or multiple SQS queues, one per XML file. With a single queue, you would greatly reduce the overall number of GET requests. With one queue per XML file, that's when you could potentially "halve" the number of GET request as compared to what you have now.
Idea 3: S3 event notifications + AWS Lambda
You can also use a Lambda function for this. This could require some more changes in your environment - you wouldn't use a Shell Script to poll, but S3 can be configured to call a Lambda Function for you as a response to an event, such as an update on your XML file. You could write your code in Java, Javascript or Python (some people devised some "hacks" to use other languages as well, including Bash).
The beauty of this is that there's no more polling, and you don't have to maintain a web server (as in "idea 1.1"). Your code "simply runs", whenever there's a change.
Notice that, no matter which one of these ideas you use, you still have to deal with eventual consistency. In other words, you'd know that a PUT/POST has happened, but once your code sends a GET, you could still receive the older version...
Idea 4: Use DynamoDB instead
If you have the ability to make a more structural change on the system, you could consider using DynamoDB for this task.
The reason I suggest this is because DynamoDB supports strong consistency, even for updates. Notice that it's not the default - by default, DynamoDB operates in eventual consistency mode, but the "retrieval" operations (GetItem, for example), support fully consistent reads.
Also, DynamoDB has what we call "DynamoDB Streams", which is a mechanism that allows you to get a stream of changes made to any (or all) items on your table. These notifications can be polled, or they can even be used in conjunction with a Lambda function, that would be called automatically whenever a change happens! This, plus the fact that DynamoDB can be used with strong consistency, could possibly help you solve your problem.
In DynamoDB, it's usually a good practice to keep the records small. You mentioned in your comments that your XML files are about 2kB - I'd say that could be considered "small enough" so that it would be a good fit for DynamoDB! (the reasoning: DynamoDB reads are typically calculated as multiples of 4kB; so to fully read 1 of your XML files, you'd consume just 1 read; also, depending on how you do it, for example using a Query operation instead of a GetItem operation, you could possibly be able to read 2 XML files from DynamoDB consuming just 1 read operation).
Some references:
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
http://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
I can think of another way by using S3 Versioning; this would require the least amount of changes to your code.
Versioning is a means of keeping multiple variants of an object in the same bucket.
This would mean that every time a new file.xml is uploaded, S3 will create a new version.
In your script, instead of getting the object and comparing it, get the HEAD of the object which contains the VersionId field. Match this version with the previous version to find out if the file has changed.
If the file has indeed changed, get the new file, and also get the new version of that file and save it locally so that next time you can use this version to check if a newer-newer version has been uploaded.
Note 1: You will still be making lots of calls to S3, but instead of fetching the entire file every time, you are only fetching the metadata of the file which is much faster and smaller in size.
Note 2: However, if your aim was to reduce the number of calls, the easiest solution I can think of is using lambdas. You can trigger a lambda function every time a file is uploaded that then calls the REST endpoint of your service to notify you of the file change.
You can use --exact-timestamps
see AWS discussion
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Instead of using versioning, you can simply compare the E-Tag of the file, which is available in the header, and is similar to the MD-5 hash of the file (and is exactly the MD-5 hash if the file is small, i.e. less than 4 MB, or sometimes even larger. Otherwise, it is the MD-5 hash of a list of binary hashes of blocks.)
With that said, I would suggest you look at your application again and ask if there is a way you can avoid this critical path.

Resources