AWS Lambda caching layer using Extensions - aws-lambda

I have a lambda function that uses SSM ParameterStore values. It queries the ParameterStore and stores the fetched values in Lambda env variables so that next time it can use them from env variables instead of making calls to the ParameterStore which is working fine if the lambda is in a hot-state, but during the cold start still, my lambda is making many calls to ParameterStore during peak traffic and getting throttling exceptions.
I'm looking to reduce the num of calls to the parameter store by having a caching layer, I found this article online, but I'm new to Lambda extensions. I just wanted to check if this caching works between the cold starts or not before I create a POC. please advise.
Thanks in advance!

Related

DynamoDB:PutItem calls silently ignored

I have a Lambda function bound to CodeBuild notifications; a Lambda instance writes details of the notification that triggered it to a DynamoDB table (BillingMode PAY_PER_REQUEST)
Each CodeBuild notification spawns an independent Lambda instance. A CodeBuild build can spawn 7-8 separate notifications/Lambda instances, many of which often happen simultaneously.
The Lambda function uses DynamoDB:PutItem to put details of the notification to DynamoDB. What I find is that out of 7-8 notifications in a 30 second period, sometimes all 7-8 get written to DynamoDB, but sometimes it can be as low as 0-1; many calls to DynamoDB:PutItem simply seem to be "ignored".
Why is this happening?
My guess is that DynamoDB simply shouldn't be accessed by multiple Lambda instances in this way; that best practice is to push the updates to a SQS queue bound to a separate Lambda, and have that separate Lambda write many updates to DynamoDB as part of a transaction.
Is that right? Why might parallel independent calls to DynamoDB:PutItem fail silently?
TIA.
DynamoDB uses a web endpoint and for that reason it can handle any number of concurrent connections, so the issue is not with how many Lambdas are writing.
I typically see this happen when users do not allow the Lambda to wait until the API requests are complete and the container gets shut down prematurely. I would first check your code and ensure that your lambda is staying alive for all items to be processed, you can do this by adding some simple logging in your code.
What you are describing is a good use case for Step Functions.
As much as Lambda functions are great to glue between services, they have their overheads and their limitations. With Step Functions, you can call directly to DynamoDB:PutItem, and you can handle various scenarios and flows, such as Async calls. These flows are possible to implement in a Lambda function, however with less visibility and with less traceability.
BTW, you can also call a Lambda function from Step Functions, however, I recommend you to try and use the direct service call to maximize the benefits of the Step Functions service.
My mistake, I had a separate issue which was messing up some of the range keys and causing updates to "fail" silently. But thx for the tip regarding timeouts

AWS cost one vs many lambda functions

I am currently developing an API management platform where it is possible to move every endpoint action to a serverless function (lambda).
My question is: It is cheaper to use a single lambda function which then invokes the complete app and the app makes internally the routing or it is better to use the AWS routing and create a lambda for each endpoint, in my case this could be (100+) lambda instances.
From a technical perspective I think it is better to have multiple lambda functions since then we can also scale each function independently but I am not sure how it looks on the costs side. So please let me know if you have any experiences.
Look here:
https://s3.amazonaws.com/lambda-tools/pricing-calculator.html
The most important thing to keep in mind, is run short time functions at lambda, slow executions can increase your budget! But many lambda fast invokes no!
You need to know your executions time! To maintain a very large set of lambda functions i recommend to you:
https://www.serverless.com/

Best method to persist data from an AWS Lambda invocation?

I use AWS Simple Email Services (SES) for email. I've configured SES to save incoming email to an S3 bucket, which triggers an AWS Lambda function. This function reads the new object and forwards the object contents to an alternate email address.
I'd like to log some basic info. from my AWS Lambda function during invocation -- who the email is from, to whom it was sent, if it contained any links, etc.
Ideally I'd save this info. to a database, but since AWS Lambda functions are costly (relatively so to other AWS ops.), I'd like to do this as efficiently as possible.
I was thinking I could issue an HTTPS GET request to a private endpoint with a query-string containing the info. I want logged. Since I could fire my request async. at the outset and continue processing, I thought this might be a cheap and efficient approach.
Is this a good method? Are there any alternatives?
My Lambda function fires irregularly so despite Lambda functions being kept alive for 10 minutes or so post-firing, it seems a database connection is likely slow and costly since AWS charges per 100ms of usage.
Since I could conceivable get thousands of emails/month, ensuring my Lambda function is efficient is paramount to cost. I maintain 100s of domain names so my numbers aren't exaggerated. Thanks in advance.
I do not think that thousands per emails per month should be a problem, these cloud services have been developed with scalability in mind and can go way beyond the numbers you are suggesting.
In terms of persisting, I cannot really understand - lack of logs, metrics - why your db connection would be slow. From the moment you use AWS, it will use its own internal infrastructure so speeds will be high and not something you should be worrying about.
I am not an expert on billing but from what you are describing, it seems like using lambdas + S3 + dynamoDB is highly optimised for your use case.
From the type of data you are describing (email data) it doesn't seem that you would have neither a memory issue (lambdas have mem constraints which can be a pain) or an IO bottleneck. If you can share more details on your memory used during invocation and the time taken that would be great. Also how much data you store on each lambda invocation.
I think you could store jsonified strings of your email data in dynamodb easily, it should be pretty seamless and not that costly.
Have not used (SES) but you could put a trigger on DynamoDB whenever you store a record, in case you want to follow up with another lambda.
You could combine S3 + dynamoDB. When you store a record, simply upload a file containing the record to a new S3 key and update the row in DynamoDB with a pointer to the new S3 object
DynamoDB + S3
You can now persist data using AWS EFS.

Amazon Web Services: Spark Streaming or Lambda

I am looking for some high level guidance on an architecture. I have a provider writing "transactions" to a Kinesis pipe (about 1MM/day). I need to pull those transactions off, one at a time, validating data, hitting other SOAP or Rest services for additional information, applying some business logic, and writing the results to S3.
One approach that has been proposed is use Spark job that runs forever, pulling data and processing it within the Spark environment. The benefits were enumerated as shareable cached data, availability of SQL, and in-house knowledge of Spark.
My thought was to have a series of Lambda functions that would process the data. As I understand it, I can have a Lambda watching the Kinesis pipe for new data. I want to run the pulled data through a bunch of small steps (lambdas), each one doing a single step in the process. This seems like an ideal use of Step Functions. With regards to caches, if any are needed, I thought that Redis on ElastiCache could be used.
Can this be done using a combination of Lambda and Step Functions (using lambdas)? If it can be done, is it the best approach? What other alternatives should I consider?
This can be achieved using a combination of Lambda and Step Functions. As you described, the lambda would monitor the stream and kick off a new execution of a state machine, passing the transaction data to it as an input. You can see more documentation around kinesis with lambda here: http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html.
The state machine would then pass the data from one Lambda function to the next where the data will be processed and written to S3. You need to contact AWS for an increase on the default 2 per second StartExecution API limit to support 1MM/day.
Hope this helps!

How to load big file model in a lambda function

Say, I want a lambda function to predict incoming message category with a trained model. However, the model is over-sized (~ 1GB).
With current architecture, I should upload the trained model to AWS S3 and then load it every time the lambda is triggered. This is not desirable since most of time is loading the model.
Some solution in mind:
Don't use lambda. Have a dedicated ec2 instance to work
Keep in warm by periodically sending dummy request
Or, I suspect AWS will cache the file, so the next loading time could be shorter?
I think reading about container reuse in lambda could be helpful here.
https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/
You can add the model as a global cached variable by declaring and initialising it outside the handler function. And if Lambda is reusing the same container for subsequent requests the file won't be re-downloaded.
But it's entirely up to Lambda whether to reuse the container or start a new one. Since this is Lambda's prerogative you can't depend on this behaviour.
If you want to minimise the number of downloads from S3 maybe using an external managed caching solution (Elasticache, Redis) in the same AZ as your function is a possible alternative you can look at.

Resources