We are using a serverless architecture along with AWS Lambda and an API gateway. The execution time of the lambda is in the range of a few milliseconds. The final response at the client's end is received in seconds (which is far more than the lambdas execution time even if init duration is also taken into count considering cold-start scenarios).
While debugging this using API gateway logs, there is integration latency in seconds which is making the end-to-end response considerably slow. To remove the init duration or cold-start I have added rules in CloudWatch for periodically calling the lambdas to keep them warm.
The init duration is removed completely and this helped in reducing integration latency as well. There are Lambdas which can not be scheduled as authentication is required for calling them and for this I have added provisioned concurrency of 5.
This Lambda has init duration as well in the logs. Lambda provisioning is another option to get rid of cold-starts but this is not having impact over the time in which Lambda's response is available at API gateway.
I have followed below links to assign provisioned concurrency to Lambdas:
Provisioned Concurrency: What it is and how to use it with the Serverless Framework
AWS News Blog – Provisioned Concurrency for Lambda Functions
CloudWatch logs of the Lambda to which I have added provisioning to:
Duration: 1331.38 ms Billed Duration: 1332 ms Memory Size: 256 MB Max Memory Used: 130 MB Init Duration: 1174.18 ms
One thing I could notice in API Gateway and Lambda logs is that Request to lambda was sent from API Gateway at 2021-02-15T11:51:36.621+05:30 but, it was received at Lambda at 2021-02-15T11:51:38.535+05:30. There is about 2 seconds of delay in getting the request at Lambda.
AWS X-RAY TRACING
I have enabled AWS X-Ray logging for API gateway and Lambda both and this is what I have by the traces. This request took 595 ms in total but at Postman the response was received in 1558 ms. From where the delay of approximately 1 second is being added to receiving a response from API gateway?
I believe the reason is that the provisioned concurrency of 5 is not enough and you still run into cold starts of your Lambda function. This means if the external service is calling your API endpoint (i.e. your Lambda function behind API Gateway), your Lambda function is warm with 5 instances. If we assume your Lambda function can handle 2 requests per second (500ms for each invocation), then you can roughly handle 10 requests per second with your Lambda function. If the external service is making 20 requests per second, AWS Lambda tries to spin up new instances because the existing ones are busy handling requests. This has the consequence that the external service experiences high response times because of cold starts of your function.
Also, consider that the instances of your Lambda function do not live "forever" but are cleaned up after some point. I.e. if you experience many spikes in your traffic patterns, then this can mean that after one spike the instances live like 15 minutes, then AWS Lambda shuts them down to only keep the 5 provisioned ones and if then another spike comes, you'll see the same problem as before.
Please note: This is a very simplified explanation of what's happening behind the scenes - and more a good guess based on your description. It would help if you'd provide some example numbers (e.g. init duration, execution duration, response time) and maybe some example code what you're doing in your Lambda function. Also: which runtime are you using? how does your traffic pattern look like?
Potential solutions
Reduce the cold start time of your Lambda functions -> always a good idea for Lambda functions that are behind an API Gateway
Provision more instances -> only possible up to a certain (soft) limit
===== Flow-of-Services =====
API GW ⮕ Lambda function(Provisioned)
===== Query =====
You want to understand why there is latency while processing the request from API GW to Lambda function.
===== Time-stamp-of-CW =====
021-02-15T11:51:36.621+05:30
2021-02-15T11:51:38.535+05:3
Lambda duration - Duration: 1331.38 ms Billed Duration: 1332 ms Memory Size: 256 MB Max Memory Used: 130 MB Init Duration: 1174.18 ms
===== Follow-up questions =====
While the request was processed through the API GW to Lambda function, the execution env took 1174.18 ms(1.1s) sections to become active & execute your code in remaining 0.3s which makes it total of 1.4 seconds.
Q. What is the type of processor you are using?
Q. Type of API & Endpoint type?
===== K.C =====
You should read AWS DOC for optimizing your Lambda function code.Optimizing static initialization
Lambda won't charge you for the time it takes to initialize your code (e.g. importing stuff) as long as it finishes in about X seconds.1
===== Replication Observation =====
Without provisioned concurrency.
API GW execution time - 286 ms
Initialization -195ms
Invocation - 11ms
Overhead - 0ms
With provisioned concurrency.
API GW execution time - 1.103ms
Initialization - 97ms
Invocation - 1ms
Overhead - 0ms
I'm in US-WEST-2 region and calling the request from 12,575 km away from the Region. I hv a REST API which is configured as 'Regional' endpoint type. Lambda function is running on x86_64 – 64-bit x86 architecture, for x86-based processors.
-- Check if you have optimized Lambda function code.
-- To have low latency, you may make use of 'edge' optimized Rest API.An edge-optimized API endpoint is best for geographically distributed clients. API requests are routed to the nearest CloudFront Point of Presence (POP).
-- Alway choose the Region which is closest to the high traffic region.
--
References:
new-provisioned-concurrency-for-lambda-functions
provisioned-concurrency.html#optimizing-latency
Related
My understanding is that a single lambda#edge instance can only handle one request at a time, and AWS will spin up new instances if all existing instances are serving a request.
My lambda has a heavy instance startup cost (~2 seconds) but a very light execution cost. It triggers on viewer requests, which always come in batches of ~20 (loading a single-page application). This means one user loading the app, on a cold start, will start ~20 lambda instances and take ~2 seconds.
But due to the very light execution cost, a single lambda instance could handle all 20 requests and it would still take only ~2 seconds.
An extra advantage is, since each instance connects to a 3rd party service on startup, there would be only 1 open connection instead of 20.
Is this possible?
Lambda#edge doesn’t support reserved nor provisioned concurrency.
Here is the link to the documentation for reference: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/edge-functions-restrictions.html#lambda-at-edge-function-restrictions
That being said, with 2s cold start, you might consider using standard lambda.
Also, can’t you reduce that cold start somehow?
I am using AWS and using the serverless framework. My serverless lambda function gets triggered by event. Then I talk with Database and there is a limit in the number of connections I can open with DB.
So I want to only run 5 lambda functions at a time and queue other events. I know there is:
provisionedConcurrency: 3 # optional, Count of provisioned lambda instances
reservedConcurrency: 5 # optional, reserved concurrency limit for this function. By default, AWS uses account concurrency limit
So in this case, the specified number of long running jobs will be there and they will be serving the events.
But rather than that what I want is event queuing and the functions will be triggered such that at most 5 functions are running at a time.
I am wondering whether this notion of event queuing is supported in AWS?
In AWS Lambda, a concurrency limit determines how many function invocations can run simultaneously in one region. You can set this limit though AWS Lambda console or through Serverless Framework.
If your account limit is 1000 and you reserved 100 concurrent executions for a specific function and 100 concurrent executions for another, the rest of the functions in that region will share the remaining 800 executions.
If you reserve concurrent executions for a specific function, AWS Lambda assumes that you know how many to reserve to avoid performance issues. Functions with allocated concurrency can’t access unreserved concurrency.
The right way to set the reserved concurrency limit in Serverless Framework is the one you shared:
functions:
hello:
handler: handler.hello # required, handler set in AWS Lambda
reservedConcurrency: 5 # optional, reserved concurrency limit for this function. By default, AWS uses account concurrency limit
I would suggest to use SQS to manage your Queue. One of the common architectural reasons for using a queue is to limit the pressure on a different part of your architecture. This could mean preventing overloading a database or avoiding rate-limits on a third-party API when processing a large batch of messages.
For example, let's think about your case where your SQS processing logic needs to connect to a database. You want to limit your workers to have no more than 5 open connections to your database at a time, with concurrency control, you can set proper limits to keep your architecture up.
In your case you could have a function, hello, that receives your requests and put them in a SQS queue. On the other side the function compute will get those SQS messages and compute them limiting the number of concurrent invocations to 5.
You can even set a batch size, that is the number of SQS messages that can be included in a single lambda.
functions:
hello:
handler: handler.hello
compute:
handler: handler.compute
reservedConcurrency: 5
events:
- sqs:
arn: arn:aws:sqs:region:XXXXXX:myQueue
batchSize: 10 # how many SQS messages can be included in a single Lambda invocation
maximumBatchingWindow: 60 # maximum amount of time in seconds to gather records before invoking the function
Have you considered a proxy endpoint (acting like a pool) instead of limiting the concurrency of the lambda. Also, I think the way the lambda <-> SQS communication happens is via some event pool, and setting the concurrency lower than however many threads they have going will cause you to have to handle lost messages.
https://aws.amazon.com/rds/proxy/
I have an API which take a json object and forward it to Azure Event Hub. The API running .NET Core 3.1, with EventHub SDK 3.0, it also have Application Insight configured to collect dependency telemetry, including Event Hub.
Using the following kusto query in Application Insight, I've found that there are some call to Event Hub which have really high latency (highest is 60 second, on average it fall around 3-7 seconds).
dependencies
| where timestamp > now()-7d
| where type == "Azure Event Hubs" and duration > 3000
| order by duration desc
Also it is worth noting that it return 890 results, out of 4.6 million Azure Event Hubs dependency result
I've check Event Hub metrics blade on Azure Portal, with average (in 1 minute time granularity) incoming/outgoing request way below the throughput unit (I have 2 event hubs in a EH namespace, 1 TU, autoscale to 20 max), which is around 50-100 message per second, bytes around 100kB, both incoming and outgoing. 0 throttled requests, 1-2 server/user errors from time to time
There are spike but it does not exceed throughput limit, and the slow dependency timestamp also don't match these spike
I also increased throughput unit to 2 manually, and it does not change anything
My question is:
Is it normal to have extremely high latency to Event Hub sometimes? Or it is acceptable if it only in small amount?
Codewise, only use 1 EventHubClient instance to send all the request, it is a bad practice or should I used something else like a client pool?
I also have a support engineer told me during a timestamp where I have high latency in Application Insight, the Event Hub log does not seem to have such high latency (322ms max), without going into details, it is possible for Application Insight to produce wrong performance telemetry?
We have an AWS Lambda written in Java that usually completes in about 200 ms. Occasionally, it times out after 5 seconds (our configured timeout value).
I understand that there is occasional added latency due to container setup (though, I'm not clear if that counts against your execution time). I added some debug logging, and it seems like the code just runs slow.
For example, a particularly noticeable log entry shows a call to HttpClients.createDefault usually takes less than 200 ms (based on the fact that the Lambda executes in less than 200 ms), but when the timeout happens, it takes around 2-3 seconds.
2017-09-14 16:31:28 DEBUG Helper:Creating HTTP Client
2017-09-14 16:31:31 DEBUG Helper:Executing request
Unless I'm misunderstanding something, it seems like any latency due to container initialization would have already happened. Am I wrong in assuming that code execution should not have dramatic differences in speed from one execution to the next? Or is this just something we should expect?
Setting up new containers or replacing cold containers takes some time. Both account against your time. The time you see in the console is the time you are billed against.
I assume that Amazon doesn't charge for the provisioning of the container, but they will certainly hit the timer as soon as your runtime is started. You are likely to pay for the time during which the SDK/JDK gets initialized and loads it's classes. They are certainly not charging us for the starting of the operation system which hosts the containers.
Running a simple Java Lambda two times shows the different times for new and reused instances. The first one is 374.58 ms and the second one is 0.89 ms. After that you see the billed duration of 400 and 100 ms. For the second one the container got reused. While you can try to keep your containers warm as already pointed out by #dashmug, AWS will occasionally recycle the containers and as load increases or decreases spawn new containers. The blogs How long does AWS Lambda keep your idle functions around before a cold start? and How does language, memory and package size affect cold starts of AWS Lambda? might be worth a look as well. If you include external libraries you times will increase. If you look at that blog you can see that for Java and smaller memory allocations can regularly exceed 2 - 4 seconds.
Looking at these times you should probably increase your timeout and not just have a look at the log provided by the application, but a look at the START, END and REPORT entries as well for an actual timeout event. Each running Lambda container instance seems to create its own log stream. Consider keeping your Lambdas warm if they aren't called that often.
05:57:20 START RequestId: bc2e7237-99da-11e7-919d-0bd21baa5a3d Version: $LATEST
05:57:20 Hello from Lambda com.udoheld.aws.lambda.HelloLogSimple.
05:57:20 END RequestId: bc2e7237-99da-11e7-919d-0bd21baa5a3d
05:57:20 REPORT RequestId: bc2e7237-99da-11e7-919d-0bd21baa5a3d Duration: 374.58 ms Billed Duration: 400 ms Memory Size: 128 MB Max Memory Used: 44 MB
05:58:01 START RequestId: d534155b-99da-11e7-8898-2dcaeed855d3 Version: $LATEST
05:58:01 Hello from Lambda com.udoheld.aws.lambda.HelloLogSimple.
05:58:01 END RequestId: d534155b-99da-11e7-8898-2dcaeed855d3
05:58:01 REPORT RequestId: d534155b-99da-11e7-8898-2dcaeed855d3 Duration: 0.89 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 44 MB
Try to keep your function always warm and see if it would make a difference.
If the timeout is really due to container warmup, then keeping it warm will greatly help reduce the frequency of these timeouts. You'd still get cold starts when you deploy changes but at least that's predictable.
https://read.acloud.guru/how-to-keep-your-lambda-functions-warm-9d7e1aa6e2f0
For Java based applications the warm up period is more as you know it's jvm right. Better to use NodeJS or Python because the warm up period is less for them. If you are not in such a way to switch the tech stack simply keep the container warm by triggering it or increase the memory that will reduce the execution time as lambda cpu allocation is more for larger memory.
I needed to implement a stream solution using AWS Kinesis streams & Lambda.
Lambda function 1 -
It adds data to stream and is invoked every 10 seconds. I added 100 data request ( each one of 1kb) to stream. I am running two instances of the script which invokes the lambda function.
Lambda function 2 -
This lambda uses above stream as trigger. On small volume of data / interval second lambda get data on same time. But on above metrics, data reaches slower than usual ( 10 minutes slower after +1 hour streaming ).
I checked the logic of both lambda functions and verified that, first lambda does not add latency before pushing data to stream. I also verified this by stream packet in second lambda where approximateArrivalTimestamp & current time clearly have the time difference increasing..
Kinesis itself did not have any issues / throttling shown in analytics ( I am using 1 shard ).
Are their any architectural changes I need to make to have it go smoother as I need to scale up at least 10 times like 20 invocations of first lambda with 200 packets, timeout 1 - 10 seconds as later benchmarks.
I am using 100 as the batch size. Can increasing/decreasing it have advantage?
UPDATE : As I explored more online, I found ideas to implement some async / front facing lambda with kinesis which in-turn invoke actual lambda asynchronously, So lambda processing time will not become bottleneck. However, this approach also failed as I have the same latency issue. I have checked the execution time. Front facing lambda ended in 1 second. But still I get big gap between approximateArrivalTimestamp & current time in both lambdas.
Please help!
For one shard, there will one be one instance of 2nd lambda.
So it works like this for 2nd lambda. The lambda reads configured record size from stream and processes it. It won't read other records until the previous records have been successfully processed.
Adding a second shard, you would have 2 lambdas processing the records. Thus the way I see to scale the architecture is by increasing the number of shards, however make sure data is evenly distributed across shards.