Stress Test - API Gateway + AWS Lambda - performance

I am making a stress test in my system that is designed with AWS API Gateway + AWS Lambda.
I am setting 2K Virtual Users each of them making 1 transaction with a Ramp-up of 1 minute.
When making a dummy lambda the system can handle the load.
If I change my lambda to have a sleep(5), I started to see some errors on my dashboard. They're 5xx errors, but there's no logging information on Lambda function. It seems that the Lambda function was not called... The request was "blocked" on API Gateway.

There is a possiblility that you are hitting the lambda concurrency limit.aws lambda
Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed,As of May 2017, the default limit is 1000 of concurrency per AWS Region.
Also check the API gateway throttling limits if there are any set at method level for your project (default is 10,000/rps with burst of 5,000) aws apiGateway
Also make sure you inform aws that you are doing a stress test as there is a possibility that they might block you.
You can have a look at cloudwatch logs for both apigateway and lambdas which might give us some more insight.

Related

Aws xray integration with lambda and kinesis integration

I have a system that pushes messages to a kinesis stream. After that, a lambda consumes this stream and sends the wanted message to another lambda, I enabled aws x-ray in the second lambda. Still, I found that lambda doesn't sample most of the messages, i.e put the x-ray trace header 'sampled' with false. so I can not trace the request.
Is there any way to tell the lambda to trace all requests or any workaround?
github issue: https://github.com/aws/aws-xray-sdk-node/issues/567
AWS Lambda has a fixed sampling rate for traces and it cannot be modified: https://docs.aws.amazon.com/lambda/latest/dg/services-xray.html
"X-Ray doesn't trace all requests to your application. X-Ray applies a sampling algorithm to ensure that tracing is efficient, while still providing a representative sample of all requests. The sampling rate is 1 request per second and 5 percent of additional requests."

How to do performance testing of a lambda which is triggered by event bridge?

I have a lambda that is triggered whenever an event is dropped in the eventbus to which my lambda is connected and is triggered automatically. How can I performance test it to test how it performs of 500 events are dropped at a time?
Also I know aws has some inbuilt metrics like lambda execution time, xray tracing etc. Can anyone let me know how to use them for my use case?
If by "eventbus" you mean AWS EventBus which is a part of Amazon EventBrigde my expectation is that the easiest would be using PutEvents API endpoint, you can come up with a JSON payload having 500 events or make 500 separate calls with 1 event or any combination you can think of.
Be aware that the AWS API requests need to be signed to the load testing tool you choose must have the possibility to calculate this signature. A guide for Apache JMeter: How to Handle Dynamic AWS SigV4 in JMeter
With regards to metrics - check out AWS CloudWatch

Best way to schedule one-time events in serverless environments

Example use case
Send the user a notification 2 hours after signup.
Options considered
setTimeout(() => { /* send notification */ }, 2*60*60*1000); is not an option in serverless environments since the function terminates after execution (so it has to be stateless).
CloudWatch events can schedule lambda invocations using cron expressions - but this was designed for repetitive invocations (there's a limit of 100 rules/region).
I have not seen scheduling options in AWS SNS/SQS or GCP Pub/Sub. Are there alternatives with scheduling?
I want to avoid (if possible) setting up a dedicated message broker (overkill) or stateful/non-serverless instance - is there a serverless way to do this?
I can queue the events in a database and invoke a lambda function every minute to poll the database for events to execute in that minute... is there a more elegant solution?
Use AWS Step functions, they are like serverless functions that don't have the 15 minute limit like AWS Lambda does. You can design a workflow in AWS step that integrates with API Gateway, Lambda and SNS to send email and text notifications as follows:
Create a REST API via API gateway that will invoke a Lambda function passing in for example, the destination address (email, phone #) of the SNS notification, when it should be sent, notification method (e.g. email, text, etc.).
The Lambda function on invocation will invoke the Step function passing in the data (Lambda is needed because API Gateway currently can't invoke Step functions directly).
The Step function is basically a workflow, you can define states for waiting (like waiting for the specified time to send the notification e.g. 30 seconds), and states for invoking other Lambda functions that can use SNS to send out an email and/or text notifications.
A rudimentary example is provided by AWS w/ their Task Timer example.
Things are coming on GCP for doing this, but not very soon. Thereby, today, the solution is to poll a database.
You can to that with Datastore/firestore with the execution datetime indexed (to prevent to read all the documents each minute). But be careful of traffic spike, you could create hotspot.
You can use Cloud Scheduler on Google Cloud Platform. As is is stated in the official documentation :
Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all your automation tasks from one place.
Here you can check a quickstart for using it with Pub/Sub and Cloud Functions.

concurrent requests to AWS Lambda

I have an AWS lambda function that launches an AWS Batch job. I call the lambda function within R like this:
result <- httr::POST(url, body = toJSON(job, auto_unbox = TRUE))
Where url is (some details redacted):
https://XXXXXXXXXX.execute-api.ca-central-1.amazonaws.com/Prod/job"
This works great when the requests are submitted sequentially. However, if I submit the job from even a small cluster (i.e. 10 nodes), I get a lot of 502 responses, which IIUC means the Lambda API endpoint is refusing the connection due to excessive traffic.
If I throttle the requests it works as desired.
But that does not seem like very high traffic (at most, 10 concurrent requests). My questions are: 1) am I interpretting the 502 response correctly and 2) What are the concurrent request limits for Lambda requests via the API Gateway?
Based on helpful comments from above, it became apparent the problem was not concurrent requests, but timeouts from the lambda function. This was evident in the logs. So when receiving 502 responses from your lambda API endpoint, inspect the cloudwatch logs for further details, including timeouts.

What are the drawbacks of SQS poller which AWS Lambda removes?

I have an architecture which looks like as follows:-
Multiple SNS -> (AWS Lambda or SQS with Poller)??? -> Dynamo Db
So, basically multiple SNS have subscribed to AWS Lambda or SQS with Poller and that thing pushes data to Dynamo Db.
But this ? thing do lot of transformation of message in between. So, now for such case, I can either use AWS Lambda or SQS with Poller. With AWS Lambda, I can do transformation in Lambda function and with SQS with Poller, I can do transformation in Poller. With AWS Lambda, I see one problem that code would become quite large as transformation is quite complex(has lot of rules), so I am thinking to use SQS. But before finalising on SQS, I wanted to know of the drawbacks of SQS which AWS Lambda removes?
Please help. Let me know if you need further information.
Your question does not contain much detail, so I shall attempt to interpret your needs.
Option 1: SQS Polling
Information is sent to an Amazon SNS topic
An SQS queue is subscribed to the SNS topic
An application running on Amazon EC2 instance(s) regularly poll the SQS queue to ask for a message
If a message is available, the data in the message is transformed and saved to an Amazon DynamoDB table
This approach is good if the transformation takes a long time to process. The number of EC2 instances can be scaled based upon the amount of work in the queue. Multiple messages can be received at the same time. It is a traditional message-based approach.
Option 2: Using Lambda
Information is sent to an Amazon SNS topic
An AWS Lambda function is subscribed to the SNS topic
A Lambda function is invoked when a message is sent to the SNS topic
The Lambda function transforms the data in the message and saves it to an Amazon DynamoDB table
AWS Lambda functions are limited to five minutes of execution time, so this approach will only work if the transformation process can be completed within that timeframe.
No servers are required because Lambda will automatically run multiple functions in parallel. When no work is to be performed, no Lambda functions execute and there is no compute charge.
Between the two options, using AWS Lambda is much more efficient and scalable but it might vary depending upon your specific workload.
We can now use SQS messages to trigger AWS Lambda Functions.
28 JUN 2018: AWS Lambda Adds Amazon Simple Queue Service to Supported
Event Sources
Moreover, no longer required to run a message polling service or create an SQS to SNS mapping.
AWS Serverless Model supports a new event source as following:
Type: SQS
PropertiesProperties:
QueueQueue: arn:aws:sqs:us-west-2:213455678901:test-queue arn:aws:sqs:us-west-2:123791293
BatchSize: 10
AWS Console also support:
Further details:
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

Resources