How to limit parallel execution of serverless lambda function - aws-lambda

I am using AWS and using the serverless framework. My serverless lambda function gets triggered by event. Then I talk with Database and there is a limit in the number of connections I can open with DB.
So I want to only run 5 lambda functions at a time and queue other events. I know there is:
provisionedConcurrency: 3 # optional, Count of provisioned lambda instances
reservedConcurrency: 5 # optional, reserved concurrency limit for this function. By default, AWS uses account concurrency limit
So in this case, the specified number of long running jobs will be there and they will be serving the events.
But rather than that what I want is event queuing and the functions will be triggered such that at most 5 functions are running at a time.
I am wondering whether this notion of event queuing is supported in AWS?

In AWS Lambda, a concurrency limit determines how many function invocations can run simultaneously in one region. You can set this limit though AWS Lambda console or through Serverless Framework.
If your account limit is 1000 and you reserved 100 concurrent executions for a specific function and 100 concurrent executions for another, the rest of the functions in that region will share the remaining 800 executions.
If you reserve concurrent executions for a specific function, AWS Lambda assumes that you know how many to reserve to avoid performance issues. Functions with allocated concurrency can’t access unreserved concurrency.
The right way to set the reserved concurrency limit in Serverless Framework is the one you shared:
functions:
hello:
handler: handler.hello # required, handler set in AWS Lambda
reservedConcurrency: 5 # optional, reserved concurrency limit for this function. By default, AWS uses account concurrency limit
I would suggest to use SQS to manage your Queue. One of the common architectural reasons for using a queue is to limit the pressure on a different part of your architecture. This could mean preventing overloading a database or avoiding rate-limits on a third-party API when processing a large batch of messages.
For example, let's think about your case where your SQS processing logic needs to connect to a database. You want to limit your workers to have no more than 5 open connections to your database at a time, with concurrency control, you can set proper limits to keep your architecture up.
In your case you could have a function, hello, that receives your requests and put them in a SQS queue. On the other side the function compute will get those SQS messages and compute them limiting the number of concurrent invocations to 5.
You can even set a batch size, that is the number of SQS messages that can be included in a single lambda.
functions:
hello:
handler: handler.hello
compute:
handler: handler.compute
reservedConcurrency: 5
events:
- sqs:
arn: arn:aws:sqs:region:XXXXXX:myQueue
batchSize: 10 # how many SQS messages can be included in a single Lambda invocation
maximumBatchingWindow: 60 # maximum amount of time in seconds to gather records before invoking the function

Have you considered a proxy endpoint (acting like a pool) instead of limiting the concurrency of the lambda. Also, I think the way the lambda <-> SQS communication happens is via some event pool, and setting the concurrency lower than however many threads they have going will cause you to have to handle lost messages.
https://aws.amazon.com/rds/proxy/

Related

Provisioned concurrency has minor impact on response time of Lambda function

We are using a serverless architecture along with AWS Lambda and an API gateway. The execution time of the lambda is in the range of a few milliseconds. The final response at the client's end is received in seconds (which is far more than the lambdas execution time even if init duration is also taken into count considering cold-start scenarios).
While debugging this using API gateway logs, there is integration latency in seconds which is making the end-to-end response considerably slow. To remove the init duration or cold-start I have added rules in CloudWatch for periodically calling the lambdas to keep them warm.
The init duration is removed completely and this helped in reducing integration latency as well. There are Lambdas which can not be scheduled as authentication is required for calling them and for this I have added provisioned concurrency of 5.
This Lambda has init duration as well in the logs. Lambda provisioning is another option to get rid of cold-starts but this is not having impact over the time in which Lambda's response is available at API gateway.
I have followed below links to assign provisioned concurrency to Lambdas:
Provisioned Concurrency: What it is and how to use it with the Serverless Framework
AWS News Blog – Provisioned Concurrency for Lambda Functions
CloudWatch logs of the Lambda to which I have added provisioning to:
Duration: 1331.38 ms Billed Duration: 1332 ms Memory Size: 256 MB Max Memory Used: 130 MB Init Duration: 1174.18 ms
One thing I could notice in API Gateway and Lambda logs is that Request to lambda was sent from API Gateway at 2021-02-15T11:51:36.621+05:30 but, it was received at Lambda at 2021-02-15T11:51:38.535+05:30. There is about 2 seconds of delay in getting the request at Lambda.
AWS X-RAY TRACING
I have enabled AWS X-Ray logging for API gateway and Lambda both and this is what I have by the traces. This request took 595 ms in total but at Postman the response was received in 1558 ms. From where the delay of approximately 1 second is being added to receiving a response from API gateway?
I believe the reason is that the provisioned concurrency of 5 is not enough and you still run into cold starts of your Lambda function. This means if the external service is calling your API endpoint (i.e. your Lambda function behind API Gateway), your Lambda function is warm with 5 instances. If we assume your Lambda function can handle 2 requests per second (500ms for each invocation), then you can roughly handle 10 requests per second with your Lambda function. If the external service is making 20 requests per second, AWS Lambda tries to spin up new instances because the existing ones are busy handling requests. This has the consequence that the external service experiences high response times because of cold starts of your function.
Also, consider that the instances of your Lambda function do not live "forever" but are cleaned up after some point. I.e. if you experience many spikes in your traffic patterns, then this can mean that after one spike the instances live like 15 minutes, then AWS Lambda shuts them down to only keep the 5 provisioned ones and if then another spike comes, you'll see the same problem as before.
Please note: This is a very simplified explanation of what's happening behind the scenes - and more a good guess based on your description. It would help if you'd provide some example numbers (e.g. init duration, execution duration, response time) and maybe some example code what you're doing in your Lambda function. Also: which runtime are you using? how does your traffic pattern look like?
Potential solutions
Reduce the cold start time of your Lambda functions -> always a good idea for Lambda functions that are behind an API Gateway
Provision more instances -> only possible up to a certain (soft) limit
===== Flow-of-Services =====
API GW ⮕ Lambda function(Provisioned)
===== Query =====
You want to understand why there is latency while processing the request from API GW to Lambda function.
===== Time-stamp-of-CW =====
021-02-15T11:51:36.621+05:30
2021-02-15T11:51:38.535+05:3
Lambda duration - Duration: 1331.38 ms Billed Duration: 1332 ms Memory Size: 256 MB Max Memory Used: 130 MB Init Duration: 1174.18 ms
===== Follow-up questions =====
While the request was processed through the API GW to Lambda function, the execution env took 1174.18 ms(1.1s) sections to become active & execute your code in remaining 0.3s which makes it total of 1.4 seconds.
Q. What is the type of processor you are using?
Q. Type of API & Endpoint type?
===== K.C =====
You should read AWS DOC for optimizing your Lambda function code.Optimizing static initialization
Lambda won't charge you for the time it takes to initialize your code (e.g. importing stuff) as long as it finishes in about X seconds.1
===== Replication Observation =====
Without provisioned concurrency.
API GW execution time - 286 ms
Initialization -195ms
Invocation - 11ms
Overhead - 0ms
With provisioned concurrency.
API GW execution time - 1.103ms
Initialization - 97ms
Invocation - 1ms
Overhead - 0ms
I'm in US-WEST-2 region and calling the request from 12,575 km away from the Region. I hv a REST API which is configured as 'Regional' endpoint type. Lambda function is running on x86_64 – 64-bit x86 architecture, for x86-based processors.
-- Check if you have optimized Lambda function code.
-- To have low latency, you may make use of 'edge' optimized Rest API.An edge-optimized API endpoint is best for geographically distributed clients. API requests are routed to the nearest CloudFront Point of Presence (POP).
-- Alway choose the Region which is closest to the high traffic region.
--
References:
new-provisioned-concurrency-for-lambda-functions
provisioned-concurrency.html#optimizing-latency

Is there any way to check if a lambda function has been idle for a given amount of time?

I have one use case where I am supposed to execute a piece of code based on idle time of a given lambda function, I mean if given function has been idle for say 5 mins, my piece of code should run.
Is there any way to check the lambda state/status?
I assume you are looking to avoid lambda cold starts, please leverage Provisioned Concurrency which will have lambda running up with the amount of concurrency setup
https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/
If you did not mean this, then I assume idleness as "no requests processed" by lambda, if yes, then use cloudwatch metric/alarm to monitor # of invocations over a timeframe and then do whatever in its action

Limit AWS SQS messages visible per second of AWS Lambda invocations per second

I am implementing a solution that involves SQS that triggers a Lambda funcion, that uses a 3rd party API to perform some operations.
That 3rd party API has a limit of requests per second, so I would like to limit the amount of SQS messages processed by my Lambda funtion to a similar rate.
Is there any way to limit the number of messages visibles per second on the SQS or the number of invocations per second of a Lambda function?
[edited]
After some insights given in the comments about AWS Kinesis:
There is no lean solution by handling Kinesis parameters Batch Window, Batch size and payload size, due to the behaviour of Kinesis has that triggers the lambda execution if ANY of the thresholds and reached:
* Given N = the max number of request per second I can execute over the 3rd party api.
* Configuring a Batch Window = 1 second and a Batch Size of N, back presurre should trigger the execution with more than N_MAX requests.
* Configuring a Batch Windows = 1 secnd and a Batch Size of MAX_ALLOWED_VALUE, will be under performant and also does not guarantee executing less than N execution per second.
The simplest solution I have found is creating a Lambda with a fixed execution rate of 1 second, that reads a fixed number of messages N from SQS/Kinesis, and write those in another SQS/Kinesis, having those another Lambda as endpoint.
This is a difficult situation.
Amazon SQS can trigger multiple AWS Lambda functions in parallel, so there is not central oversight of how fast requests can be made to the 3rd-party API.
From Managing concurrency for a Lambda function - AWS Lambda:
To ensure that a function can always reach a certain level of concurrency, you can configure the function with reserved concurrency. When a function has reserved concurrency, no other function can use that concurrency. Reserved concurrency also limits the maximum concurrency for the function, and applies to the function as a whole, including versions and aliases.
Therefore, concurrency can be used to limit the number of simultaneous Lambda functions executing, but this does not necessarily map to "x API calls per second". That would depend upon how long the Lambda function takes to execute (eg 2 seconds) and how many API calls it makes in that time (eg 2 API calls).
It might be necessary to introduce delays either within the Lambda function (not great because you are still paying for the function to run while waiting), or outside the Lambda function (by triggering the Lambda functions in a different way, or even doing the processing outside of Lambda).
The easiest (but not efficient) method might be:
Set a concurrency of 1
Have the Lambda function retry the API call if it is rejected
Thanks to #John Rotenstein gave a comprehensive and detailed answer about SQS part.
If your design is limited to a single consumer than you may replace sqs with kinesis streams. By replacing it, you may use batch window option of kinesis to limit the requests made by consumer. Batch window option is used to reduce the number of invocations
Lambda reads records from a stream at a fixed cadence (e.g. once per second for Kinesis data streams) and invokes a function with a batch of records. Batch Window allows you to wait as long as 300s to build a batch before invoking a function. Now, a function is invoked when one of the following conditions is met: the payload size reaches 6MB, the Batch Window reaches its maximum value, or the Batch Size reaches its maximum value. With Batch Window, you can increase the average number of records passed to the function with each invocation. This is helpful when you want to reduce the number of invocations and optimize cost.

How does AWS Lambda handle concurrency?

I have a lambda that queries a database for a count, and then submits to an SQS queue a number that represents an offset of a query that another lambda works on. The second lambda is triggered by a push onto the queue. If I set the concurrency to 10, does that mean the lambda will act like a threadpool and will continue restarting until the queue is empty?
Really accurate image of what I'm thinking about:
Example
Lambda A queries DB and finds that there are 10000 items in the table, so it submits 100 messages to the queue that go from 0 to 10000 in 100 chunks.
Lambda B has a concurrency of 10 and is triggered by puts in the queue, and each one pulls, does some work, and puts the result somewhere else and does whatever lambdas do after their job is done. After they're all done, there's still 90 tasks left, does another pool of 10 start, or once a lambda is done another takes it's place?
Since Lambda B has a concurrency of 10, then there will be a maximum of 10 Lambda functions running at any time.
When one Lambda function has completed, another will be triggered until there is nothing left in the SQS queue.
It is likely that AWS Lambda will create 10 Lambda containers, and each container will be re-used on subsequent calls.
See: Understanding Container Reuse in AWS Lambda | AWS Compute Blog

How can I distribute multiple concurrent requests to aws lambda functions?

I want to build a cronjob like system that gets all users from database and make multiple (I mean lots of) concurrent requests for each of them and make some executions and save the result to db. It will run every hour in every day 7/24.
I came up with the solution that:
Gets all users from db (that's the easy part)
Dynamically creates lambda functions and distributes all users to these functions
Each lambda function makes concurrent requests and executions
(handling results and saving them to db)
Communicate these functions with SNS when needed
So, does my approach make sense for this situation?
The most important thing here is scaling (that's why I thought to distribute all users to lambda functions, for limiting concurrent requests and resources), how we can come with an scalable and efficient idea for exponentially increased user count?
Or any other suggestions?
Here is my solution:
if 100 concurrent lambdas are not enough for your need, create a ticket to increase your limit, you will be charged what will you use.
However, still you can't determine that how many lambdas will be required in future. It is not necessary to process each user in a separate lambda, instead you can invoke a lambda with a chunk of user data. e.g. Let's say, your max. lambda limit is 100 and there are 1000 users then you can do something (i don't know go, here is a python code which may not be 100% syntactically correct)
users = get_users_fromdb() # users = [1,2,3,... 1000]
number_of_users = len(users)
chunk_size = number_of_users / 100 # 100 is your lambda limit
for i in range(0, number_of_users, chunk_size)
# e.g. chunk_users_data = [1,2,3 ... 10]
chunk_users_data = users[i * chunk_size : (i + 1) * chunk_size ]
invoke_lambda_to_process_users_chunk_data()
Here is what you can do in other lambda
users = event.get('users')
for user in users:
try:
process_user(user)
except Exception as e:
print(e) # handle exception / error if you want
Update:
By default, 100 is limit for concurrent running lambdas. If you have 100K users, IMO, you should go for a support case to increase your account's concurrent lambda limit to 1000 or more. I am working on lambda and we have 10K limit. One more thing to keep in mind that it is not sure that your one lambda invocation will be able to process all users in a chunk, so add some logic to reinvoke with remaining users before timeout. A lambda can run upto max. of 5 minutes. YOu can get remaining time from context object in milli seconds.

Resources