I have a continuously running lambda(30 min interval) that is getting timedout when trying to connect to firestore. I don't really know why it is happening like this.. I have used at the beginning of the lambda
context.callbackWaitsForEmptyEventLoop = false;
Can anyone help me to solve this...please....
Does you lambda function has access to the internet? This is a really common error, you will need to setup your VPC's subnet to allow it to reach the internet.
https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/
There's a 15 minutes constraint for Lambda functions. If you go over that limit, they'll timeout and there's no way to work around it.
You can see it in the docs:
You can now set the timeout value for a function to any value up to 15
minutes. When the specified timeout is reached, AWS Lambda terminates
execution of your Lambda function. As a best practice, you should set
the timeout value based on your expected execution time to prevent
your function from running longer than intended.
You can also check AWS Lambda Limits. While some of these limits can be raised by contacting AWS, the maximum execution time is not one of them.
If your function runs in less than 15 minutes, you can simply increase the timeout for your function via the console (under basic settings, I am attaching a screenshot) or via aws-cli (or via frameworks such as AWS SAM, Serverless, etc. if you're using one).
Check how to change the limits here
However, I would try to understand why your function is timing out when connecting to Google's Firestore. I don't know anything about Google Cloud, but maybe you should allow outbound traffic on it. Maybe the timeout should be increased, but maybe Firebase is blocking any outbound traffic, making your Lambda to timeout. If your Lambda is outside a VPC, it should be able to connect to the internet seamlessly, so the connection with Firebase should be fairly quick.
One other thing I suggest is to run your Lambda function under Node 8 as you can take advantage from async/await and get rid of context and callback objects which are very confusing at first.
Related
I have a spring batch application that is hosted on AWS and runs as a lambda function. This lambda function is triggered when a file is dropped in the corresponding S3 bucket.
My question is: What would be the best way to perform health checks in this scenario? If this were a regular service running in EC2 (i.e. constantly running), I'd just schedule a health check to run after a fixed time interval, but since this lambda only runs for a couple of minutes at most, I'm not sure how I should proceed. I was thinking of just simply setting the health check status based on the individual reader and writer steps somehow. For instance, if the job was able to read successfully, return status UP, else, return some other status.
I do also want to note that the health of this app will need to be documented in splunk via logs.
Please let me know if there is a better solution. I'm new to health checks so my implementation might be incorrect.
I am running the STS AssumeRole operation from inside a Lambda function and experiencing weird behaviour. My Lambda function runs as a dedicated Role, call it LambdaRole, and I'm trying to assume a second role (call is S3Role) in order to get credentials for S3 access that I can pass to another system. This other system doesn't have an IAM role attached and I'd rather not generate static keys for that system.
The operation sometimes succeeds upon first deploying my Lambda function, and continues to work for a while, but eventually stops working. The 'stopped working' is simply a timeout where the service call never returns. Sometimes a fresh deployment of my lambda function doesn't succeed for the 'first' call either.
I've tried exploring any rate limits etc for STS but don't see any that are relevant. I can call AssumeRole from the CLI as many times as a I want and it's fast and responsive.
My Lambda function runs inside a VPC, and I've tried with and without an endpoint to STS (apparently you do not need an STS endpoint inside your VPC, which makes some sense).
So in summary - is there any extra intelligence happening during the AssumeRole operation which is causing this problem? Is something special or difference happening in the Lambda container that causes this to break? Any debugging ideas?
We've been running our production web app off AWS Lambda / API Gateway, with an Aurora serverless database. Things had been running smoothly for over a year, but recently (coinciding with much increased periods of peak usage) we've experienced temporary slowness, and in the worst case unavailability, due to some kind of bottleneck that results in a spike in the number of DB connections and 4XX and 5XX from our two APIs.
We're using the serverless-mysql library to execute queries and manage DB connections.
Some potential causes of the issue that have been eliminated:
There are no long-running queries locking up tables or anything of that sort (as demonstrated by show full processlist in MySQL), in fact no query runs longer than 1s accordingly to our slow_log
All calls to await serverlessMysql.query() are immediately followed by await serverlessMysql.end()
Our database manager class is instantiated outside the Lambda handler, so it isn't reinstantiated every time a Lambda instance is reused
We've adjusted the config options for serverless-mysql so that retries aren't so aggresive. The default config makes it very aggressive in retrying to connect, both in frequency and number of retries. This has definitely helped, but has not eliminated the problem.
What details can I post that might help someone diagnose this problem? It's a major pain in the ass.
It would be helpful to see the load this application is getting. Which I know is easier said than done with Lambda.
You sort of hinted at it, but it's possible you're hitting the Max Connections() on the 'capacity class' your aurora serverless instance is set to. I've hit this a few times. It's hard to discover with lambda and serverless aurora because you don't have the same logging you would traditionally have.
Outside of that, the core issue you're experiencing seems to be related to spikes created from your application - so you need to discover if a query is maybe just inefficient, and running too many times at once. These are almost impossible to troubleshoot with Lambda logs. But db locks still occur with aurora serverless.
To help track down the issue, you could try the following:
Setup APM
I highly, highly, recommend getting something like NewRelic setup and monitoring your Lambda function.
I'm pretty sure NR has a free trial option, and tracking down a problem like this would be seemingly simple with an APM. I can't tell you how much easier problems like this are to solve with a solid apm.
Monitor traffic ingress
Again, I'm not sure of what this application is doing, but it could be possible that a spike in network traffic from a particular user kicks off a load of queries that make things go awry. Setup a free Cloudflare account or some other proxy if you can, and determine network traffic more easily.
Hope this helps.
I'm not sure why, but from time to time - once in 20 lambda calls, I receive an error:
Connection timed out after 120000ms
the calls are done from ECS container, and all (caller and lambda) are written in node.js.
what should I check?
I know this is an old post, but I'll write how I solved a situation with the same error message in a lambda that I was working on. I hope this helps someone with a similar issue.
In my case, I also have a web app inside an EC2 which calls a lambda through lambda.invoke() (npm aws-sdk). Both EC2 and lambda runs on Node.JS. Even though the error is logged inside the EC2, the message is thrown by the lambda itself to the caller (EC2).
My lambda makes ~3,000 requests to an API, what takes ~5 minutes (300,000 ms) to get all responses back. It seems that the lambda Node.JS runtime is keeping a socket alive during the lambda execution, which is higher than 120,000 ms (2 minutes). As the lambda code keeps running for more than this threshold, the runtime throws the error, and the lambda return a callback with it.
According to aws js sdk, the AWS object has one parameter for the http timeout:
httpOptions (map) — A set of options to pass to the low-level HTTP request. Currently supported options are:
timeout [Integer] — Sets the socket to timeout after timeout milliseconds of inactivity on the socket. Defaults to two minutes (120000).
After I changed this configuration to 360,000 ms (6 minutes), the lambda executes successfully. So you can just set this parameter to a higher value, according to your needs:
AWS.config.update({httpOptions: {timeout: 360000}});
for me it was not ocassionally, but like for one time to another :S.
Somehow, one of my network devices was deactivated (PANGP Virtual Ethernet Adapter), so I re-activated it worked out
Best!
I have a handler function on AWS Lambda that is connecting to a Redis instance to store a single key in the cache. The function has completed successfully but the key in Redis shows up minutes (or more) after the fact.
This behavior is observable on both Heroku Redis and Redis Cloud, they're both hosted solutions.
I can't for the life of me figure out what's causing this lag. My Redis knowledge is practically zero, I know how to store a list using LPUSH and how to trim that list using LTRIM.
The writer to Redis uses this Node client while I observe the lag using redis-cli on my local machine.
Is it common to experience this kind of lack in the setup I describe? What can I do to debug this?
I'm purposefully ignoring most of the information in the question and would like to refer only to the alleged symptom, namely that
key show up only minutes after being stored
This behavior is impossible with Redis - any change to the data is immediately visible given Redis' design. That said, the only scenario what you're describing could be remotely possible is when you're writing to a Redis master server and reading from a very-badly-lagged slave. I can ensure you that this is not the case with Redis Cloud however.
The main reason is due to the fact that the Lambda container starts to sleep as soon as your function terminates, and the Redis client you are using is all asynchronous APIs.
Note that the API is entire asynchronous. To get data back from the server, you'll need to use a callback.
I'm assuming that the asynchronous SET is the last action performed in your Lambda function. Once that is called, the underlying Lambda container goes to sleep, and most likely, the actual SET action hasn't finished its job yet. Therefore, the record will not show in Redis until the exact same Lambda container was called to execute your function again, and finished the job that it was supposed to finish on the last execution. This is probably the lag that you are experiencing.
To test whether or not this is true, do a sleep action for a couple of seconds at the end of your function to delay the Lambda container going to sleep immediately, and see if the lag is still there.
I would also recommend not to use asynchronous behaviour APIs inside Lambda functions. They'll add state to your Lambda computation, and this is actually not recommended by AWS themselves within the Lambda documentations too.