What happens to leftover files created by lambda function - aws-lambda

If I write a file to disk inside of a lambda function, what happens to it after I'm done with the lambda function. Do I have to explicitly tell it to remove, or will amazon automatically delete everything after the function finishes running?

Lambda functions that you execute on AWS run within an isolated space called a container, which is provisioned just for you and that function. AWS may not clean up this container immediately for the purpose of making subsequent executions of your lambda function faster (as the container is already provisioned).
When your Lambda function is not executed for "an amount of time" the container will be cleaned up by AWS. If you publish a revision of your code then old containers are cleaned up and a new one is provisioned for your Lambda function on next execution.
What is important to keep in mind is that the files you mention and any variables you declare outside of the handler code will still be present on subsequent executions. The same goes for your /tmp files
Knowing that this is the case you should also consider redesigning your code to ensure a clean exit (even under a failure condition) if "left-overs" from past executions could cause you some conflict.
It's also important to make sure that you never assume a container will still exist on next execution.
You can check out some official documentation on this here:
http://docs.aws.amazon.com/lambda/latest/dg/lambda-introduction.html
I hope this helps!

Related

How to run DBT in AWS Lambda?

I have currently dockerized my DBT solution and I launch it in AWS Fargate (triggered from Airflow). However, Fargate requires about 1 minute to start running (image pull + resource provisioning + etc.), which is great for long running executions (hours), but not for short ones (1-5 minutes).
I'm trying to run my docker container in AWS Lambda instead of in AWS Fargate for short executions, but I encountered several problems during this migration.
The one I cannot fix is related to the bellow message, at the time of running the dbt deps --profiles-dir . && dbt run -t my_target --profiles-dir . --select my_model
Running with dbt=0.21.0
Encountered an error:
[Errno 38] Function not implemented
It says there is no function implemented but I cannot see anywhere which is that function. As it appears at the time of installing dbt packages (redshift and dbt_utils), I tried to download them and include them in the docker image (set local paths in packages.yml), but nothing changed. Moreover, DBT writes no logs at this phase (I set the log-path to /tmp in the dbt_project.yml so that it can have write permissions within the Lambda), so I'm blind.
Digging into this problem, I've found that this can be related to multiprocessing issues within AWS Lamba (my docker image contains python scripts), as stated in https://github.com/dbt-labs/dbt-core/issues/2992. I run DBT from python using the subprocess library.
Since it may be a multiprocessing issue, I have also tried to set "threads": 1 in profiles.yml but it did not solve the problem.
Does anyone succeeded in deploying DBT in AWS Lambda?
I've recently been trying to do this, and the summary of what I've found is that it seems to be possible, but isn't worth it.
You can pretty easily build a Lambda Layer that includes dbt & the provider you want to use, but you'll also need to patch the multiprocessing behavior and invoke dbt.main from within the Lambda code. Once you've jumped through all those hops, you're left with a dbt instance that is limited to a relatively small upper bound on memory, a 15 minute maximum runtime, and is throttled to a single thread.
This discussion gives an rough example of what's needed to get it running in Lambda: https://github.com/dbt-labs/dbt-core/issues/2992#issuecomment-919288906
All that said, I'd love to put dbt on a Lambda and I hope dbt's multiprocessing will one day support it.

middy-ssm not picking up changes to the lambda's execution role

We're using middy-ssm to fetch & cache SSM parameter values during lambda initialization. We ran into a situation where the execution role of the lambda did not have access to perform SSM::GetParameters on the path that it attempted to fetch. We updated a policy on the role to allow access, but it appeared like the lambda function never picked up the changes to permissions, but instead kept failing due to missing permissions until the end of the lifecycle (closer to 1 hour as requests kept on coming to it).
I then did a test where I fetched parameters using both the aws-lambda SDK directly and middy-ssm. Initially the lambda role didn't have permissions and both methods failed. We updated the policy and after a couple of minutes, the code that used the SDK was able to retrieve the parameter, but the middy middleware kept failing.
I tried to interpret the implementation of middy-ssm to figure out if the error result is somehow cached or what is going on there, but couldn't really pinpoint the issue. Any insight and/or suggestions how to overcome this are welcome! Thanks!
So as pointed out by Will in the comments, this turned out to be a bug.

Cannot delete Lambda#Edge created by Cloud Formation

I cannot delete a Lambda#Edge function create by Cloud Formation. During the Cloud Formation creation process an error occurred and the rollback process was executed. At the end we can't remove the Lambda created, we resolved the CF problem, renamed the resource and CF created a new Lambda. But the old one continues there. There aren't Cloud Front or another resource linked at the old Lambda and still we can't remove. When we try to remove we receive this message:
An error occurred when deleting your function: Lambda was unable to
delete
arn:aws:lambda:us-east-1:326353638202:function:web-comp-cloud-front-fn-prod:2
because it is a replicated function. Please see our documentation for
Deleting Lambda#Edge Functions and Replicas.
I know that if there aren't linked resources to Lambda#Edge after some minutes the replicas are deleted. But we can't find the linked resources.
Thank you in advance for your help.
I had a similar issue where I simply wasn't able to delete a Lambda#Edge, and the following helped,
Create a new Cloudfront distribution, and associate your Lambda#Edge with this new distribution.
Wait for the distribution to be fully deployed.
Remove the association of your Lambda#Edge from the Cloudfront distribution that you just created.
Wait for the distribution to be fully deployed.
Additionally, wait for a few more minutes.
Then, try to delete your Lambda#Edge.
The error message clearly indicates that the function still is replicated at the edge, which is the reason why you cannot delete it. So you first have to remove the lamda#edge association before deleting the function. If they are created in the same stack the easiest way is probably to set the lambda function's DeletionPolicy to Retain and to remove it manually afterwards.
Keep in mind that it can take up to a few hours before the replicas are deleted, not after some minutes. Usually, I just wait until the next day to remove them.

Laravel: How to detect if code is being executed from within a queued job, as opposed to manually run from the CLI

I found this similar question How to check If the current app process is running within a queue environment in Laravel
But actually this is the opposite of what I want. I want to be able to distinguish between code being executed manually from an artisan command launched on the CLI, and when a job is being run as a result of a POST trigger via a controller, or a scheduled run
Basically I want to distinguish between when a job is being run via the SYNC driver, manually triggered by the developer with eyes on the CLI output, and otherwise
app()->runningInConsole() returns true in both cases so it is not useful to me
Is there another way to detect this? For example is there a way to detect the currently used queue connection? Keeping in mind that it's possible to change the queue connection at runtime so just checking the value of the env file is not enough

Trigger AWS Lambda function whenever a new file arrived on two different s3 prefixes

Every day we get one incremental file, and we have multiple sources from which we gets incremental files. And both will place these files in two different s3 prefixes. But they come in different time. We want to process both the files in one go and generate a report out of that. For this I will be using AWS Lambda and Data Pipeline. We will trigger AWS Data pipe line through Lambda. And Lambda will be triggered whenever a new file arrived.
We are able to the same when we have single source, so we created a s3 trigger ever for lambda and when ever the file comes, it is getting triggered and starting pipe line and emr activity is getting and at the end the report is getting generated.
Now we have the second source as well, and now we want to start the activity whenever both the files are arrived/uploaded.
Not sure if we can trigger aws lambda with more than one dependency. I know this can be done through Step Functions, i might go to that route if we dont have support for triggering lambda with multiple dependencies.
Trigger AWS Lambda function whenever new files arrived on two different s3 prefixes. Dont trigger lambda function if a file arrived on only s3 location but not on other location.

Resources