I have an AWS lambda function that launches an AWS Batch job. I call the lambda function within R like this:
result <- httr::POST(url, body = toJSON(job, auto_unbox = TRUE))
Where url is (some details redacted):
https://XXXXXXXXXX.execute-api.ca-central-1.amazonaws.com/Prod/job"
This works great when the requests are submitted sequentially. However, if I submit the job from even a small cluster (i.e. 10 nodes), I get a lot of 502 responses, which IIUC means the Lambda API endpoint is refusing the connection due to excessive traffic.
If I throttle the requests it works as desired.
But that does not seem like very high traffic (at most, 10 concurrent requests). My questions are: 1) am I interpretting the 502 response correctly and 2) What are the concurrent request limits for Lambda requests via the API Gateway?
Based on helpful comments from above, it became apparent the problem was not concurrent requests, but timeouts from the lambda function. This was evident in the logs. So when receiving 502 responses from your lambda API endpoint, inspect the cloudwatch logs for further details, including timeouts.
Related
I have a lambda that is triggered whenever an event is dropped in the eventbus to which my lambda is connected and is triggered automatically. How can I performance test it to test how it performs of 500 events are dropped at a time?
Also I know aws has some inbuilt metrics like lambda execution time, xray tracing etc. Can anyone let me know how to use them for my use case?
If by "eventbus" you mean AWS EventBus which is a part of Amazon EventBrigde my expectation is that the easiest would be using PutEvents API endpoint, you can come up with a JSON payload having 500 events or make 500 separate calls with 1 event or any combination you can think of.
Be aware that the AWS API requests need to be signed to the load testing tool you choose must have the possibility to calculate this signature. A guide for Apache JMeter: How to Handle Dynamic AWS SigV4 in JMeter
With regards to metrics - check out AWS CloudWatch
SetUp:
We have gRPC pods running in a k8s cluster. The service mesh we use is linkerd. Our gRPC microservices are written in python (asyncio grpcs as the concurrency mechanism), with the exception of the entry-point. That microservice is written in golang (using gin framework). We have an AWS API GW that talks to an NLB in front of the golang service. The golang service communicates to the backend via nodeport services.
Requests on our gRPC Python microservices can take a while to complete. Average is 8s, up to 25s in the 99th %ile. In order to handle the load from clients, we've horizontally scaled, and spawned more pods to handle concurrent requests.
Problem:
When we send multiple requests to the system, even sequentially, we sometimes notice that requests go to the same pod as an ongoing request. What can happen is that this new request ends up getting "queued" in the server-side (not fully "queued", some progress gets made when context switches happen). The issue with queueing like this is that:
The earlier requests can start getting starved, and eventually timeout (we have a hard 30s cap from API GW).
The newer requests may also not get handled on time, and as a result get starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.
What's strange is that we have other pods available, but for some reason the loadbalancer isn't routing it to those pods smartly. It's possible that linkerd's smarter load balancing doesn't work well for our high latency situation (we need to look into this further, however that will require a big overhaul to our system).
One thing I wanted to try doing is to stop this queuing up of requests. I want the service to immediately reject the request if one is already in progress, and have the client (meaning the golang service) retry. The client retry will hopefully hit a different pod (do let me know if that won’t happen). In order to do this, I set the "maximum_concurrent_rpcs" to 1 on the server-side (Python server). When i sent multiple requests in parallel to the system, I didn't see any RESOURCE_EXHAUSTED exceptions (even under the condition when there is only 1 server pod). What I do notice is that the requests are no longer happening in parallel on the server, they happen sequentially (I think that’s a step in the right direction, the first request doesn’t get starved). That being said, I’m not seeing the RESOURCE_EXHAUSTED error in golang. I do see a delay between the entry time in the golang client and the entry time in the Python service. My guess is that the queuing is now happening client-side (or potentially still server side, but it’s not visible to me)?
I then saw online that it may be possible for requests to get queued up on the client-side as a default behavior in http/2. I tried to test this out in custom Python client that mimics the golang one with:
channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)
# create stub to server with channel…
However, I'm not seeing any change here either. (Note, this is a test dummy client - eventually i'll need to make this run in golang. Any help there would be appreciated as well).
Questions:
How can I get the desired effect here? Meaning server sends resource exhausted if already handling a request, golang client retries, and it hits a different pod?
Any other advice on how to fix this issue? I'm grasping at straws here.
Thank you!
I have implemented an API which adhere with the Snowflake's Asynchronous External Function.
In our developed system, we are using AWS API gateway, Lambda function and a Third Party API( TPA).
In our scenarios, we store certain information in Snowflake's table and try to enrich this table using Snowflake's External User Defined Function.
We are able to enrich the table if the number of records are less. If we try to enrich the 3 millions of records, then after certain time, our TAPI starts sending HTTP 429. This is a indicator which tells our lambda function to slow the number of Snowflake's requests.
We understand this and the moment Lambda function gets the HTTP 429, then it sends the HTTP 429 back to Snowflake in any polling/post requests. It is expected that Snowflake will slow down the request rather than throwing an error and stopped processing further.
Below response to Snowflake
{
"statusCode" : 429
}
And it is a fixed situation which looks like Snowflake is not respecting HTTP 429 in the Request-Reply Pattern.
Snowflake does handle HTTP 4xx responses when working with external functions.
Have you engaged support? I have worked with customers having this issue, and snowflake team is able to review.
AWS API gateway has a default limit of 10000 rps.
Please review Designing High Performance External Functions
Remote services should return HTTP response code 429 when overloaded.
If Snowflake sees HTTP 429, Snowflake scales back the rate at which it
sends rows, and retries sending batches of rows that were not
processed successfully.
Your options for resolution are:
Work with AWS to increase your API Gateway rate limit.
However, some proxy services, including Amazon API Gateway and Azure
API Management, have default usage limits. When the request rate
exceeds the limit, these proxy services throttle requests. If
necessary, you might need to ask AWS or Azure to increase your quota
on your proxy service.
or
Try using a smaller warehouse, so that snowflake sends less volume to API gateway per second. This has obvious drawback of you running slower.
I am wondering how Quarkus is handling simultaneous requests to for example a REST API with json-rest.
Example:
REST API is called by lots of clients simultaneously
REST API call calls other APIs
REST API processes the response of the other called APIs and returns the processed response
Questions:
Are the requests queued and processed one by one?
Are requests rejected if the API is busy?
Is handling parallelism offloaded to the infrastructure using tools like Istio?
Can someone please point me to some documentation about that or give an explanation? Thank you.
Quarkus uses Vert.x under the hood which implements an event loop. This means that it can handle thousands of the requests because its threads are not blocked.
You may read more about it in the Vert.x's documentation: https://vertx.io/docs/vertx-core/java/
I am making a stress test in my system that is designed with AWS API Gateway + AWS Lambda.
I am setting 2K Virtual Users each of them making 1 transaction with a Ramp-up of 1 minute.
When making a dummy lambda the system can handle the load.
If I change my lambda to have a sleep(5), I started to see some errors on my dashboard. They're 5xx errors, but there's no logging information on Lambda function. It seems that the Lambda function was not called... The request was "blocked" on API Gateway.
There is a possiblility that you are hitting the lambda concurrency limit.aws lambda
Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed,As of May 2017, the default limit is 1000 of concurrency per AWS Region.
Also check the API gateway throttling limits if there are any set at method level for your project (default is 10,000/rps with burst of 5,000) aws apiGateway
Also make sure you inform aws that you are doing a stress test as there is a possibility that they might block you.
You can have a look at cloudwatch logs for both apigateway and lambdas which might give us some more insight.