Lambda function failed to stabilize since it is in InProgress state - aws-lambda

We've been experiencing issues with DockerImageFunction. All deployments fail with the following (and very cryptic to me) error:
Lambda function XXX failed to stabilize since it is in InProgress state
Other functions deploy without any problems. Deployments were working fine until Wednesday 07.04.2021. Since then, they fail every time. We haven't changed anything in our CDK code, in that function Dockerfile or its code.
We deploy using cdk typescript. I tested with 1.93 and 1.97 (latest version at the time of this writing).
Any clue ?

This looks to be a bug that Amazon will need to resolve. However...
In the event someone needs a temporary workaround to deploy their CDK stack which includes a DockerImageFunction and they do not want to delete the whole stack first (perhaps because some resources are S3 buckets with important data), here are some steps that worked for me. This assumes your stack is in the state described above, i.e. an update has failed, the system attempted a rollback, and then the update rollback also failed.
From the CloudFormation console select "Continue update rollback"
Select "advanced options" and choose to skip the function or functions that use the containerized deployment (i.e. DockerImageFunctions)
The rollback should now complete successfully
If you try to deploy again now the stack will return to the UPDATE_ROLLBACK_FAILED state again, so don't bother. Instead comment out all the code that instantiates and references the DockerImageFunctions in your CDK stack class. Then perform the deployment, which should remove those functions and their various roles and permissions from the CloudFormation stack.
Once this is complete you can uncomment all the stack code you just commented out and perform a final deploy. This one should succeed. It did for me at least: all the latest version of my application is deployed.
It seem likely that if I perform another deploy after this the same error will occur and I will have to go through these five steps again. I haven't tried it yet. But at least this is workaround, however clumsy.

FYI - This issue should be resolve. Confirmed with AWS support and our stacks.

Related

Cloudformation returns RollbackInProgress

Please I attempted deploying an OpenSwan VPN server to a virtual machine as highlighted in the Amazon Web Services in Action By Michael Wittig textbook.
When I attempt creating the stack, it returns to rollback_in_progress.
aws cloudformation describe-stack-events --stack-name openvpn
Here is the remote repository: https://github.com/LaVie-environment/awsWebservices
I executed the command below with an expectation of creating the OpenSwan VPN server.
aws cloudformation describe-stack-events --stack-name openvpn
When your stack fails to deploy for the first time (“create” rather than “update”), it cannot actually roll back to a known/stable state. Therefore, the only option you have is to remove it altogether and deploy again.
Some hints:
Although I also prefer the CLI in most cases, the Cloudformation UI is quite helpful in watching how your stack is created and, in case of errors, analyse what went wrong.
If you have a very complex stack and/or you try a few things out, it’s sometimes more convenient to just deploy a part of your stack at first, because it’s easier to incrementally update a stack than creating and re-creating all over. Simply comment the building blocks you don’t need from the start and then uncomment them one by one.

AWS Codebuild: Monorepo and multiple builds?

I have an AWS CodePipeline which uses CodeBuild as the build step and deploys Lambda functions. This pipeline is triggered upon any commit on the development branch which houses multiple Lambda functions. Right now, since all these Lambdas use the same pipeline, they have the same build job as well.
The problem is, what happens in case one of my Lambdas has a different requirement in the build step (say installing a library). Is there any way to trigger a different build job for a specific Lambda? I am guessing this delves into the age-old issue of Codepipeline unable to deal with monorepo, but any suggestions are welcome.
You could integrate change detection for your lambda functions. The only thing you need is that you need to check out the source separately in the job so you got the .git folder (see: https://forums.aws.amazon.com/thread.jspa?threadID=251732).
Afterwards you can easily check with git which lambda function was actually changed and run your pre-build commands based on the result.

Deploying Single Lambda Function From CI/CD pipeline

I am dealing an infrastructure and trying to figure it out how to deploy just single lambda from CI/CD pipeline.
Let's say in a repo you have 20 lambdas, and you made change for one single lambda, instead of deploying all of them i just want to deploy the changed one so cut out the deployment time.
I've got an idea like checking difference from git and figure it out which ones are changed, and do deployment only that part of functionality, but it surely doesn't seem right way to do it. Believing there is more proper way to do it.
I am using terraform for now (moving to serverless framework) i know that terraform and serverless framework holds a state on s3 bucket. However on my case when i run it through pipelines, eventhogh there is a terraform state and there is no change on the state, it still deploys the whole thing as far as realised (i might be wrong). I just want to get clear my mind to see how people does this with their pipline.
Since you seem to be asking about both Terraform and Serverless Framework here, I'm assuming you're looking for a general answer rather than specifically how this would be solved with a particular tool.
One way to solve this problem is to decouple your build process from your deploy process by adding a version selection mechanism in between. This just means that somewhere in your system you have a value that can be written by your build process and read by your deploy process which indicates what is the "current" artifact for each of your Lambda functions.
When your build process completes successfully, it can write the information about the artifact it built into the appropriate location, and then trigger your deployment process. Your deployment process will then read the artifact information and use it to decide what to deploy.
If you have made no changes to the current artifact metadata for a particular function then the deploy process can see that and not do anything. If a particular artifact is flawed in some way and you only notice once it's deployed, you can potentially set the artifact metadata back to the previous one and re-run the deployment process to roll back. If you choose a data store that retains historical versions, you'll also have a log of changes to the current artifact which might be useful to understand circumstances that lead to an incident.
Without getting into specifics it's hard to say more about this. For Terraform in particular, the artifact metadata store ought to be something that Terraform can read using a data source. To show a real example I'm going to just arbitrarily choose AWS SSM Parameter Store as a location for that artifact metadata store:
data "aws_ssm_parameter" "foo" {
name = "FooFunctionArtifact"
}
locals {
# For this example, we'll assume that the stored parameter is a JSON
# string shaped like this:
# {
# "s3_bucket": "awesomecorp-app-artifacts"
# "s3_key": "/awesomeapp/v1.2.0/function.zip"
# }
foo_artifact = jsondecode(data.aws_ssm_parameter.foo)
}
resource "aws_lambda_function" "foo" {
function_name = "foo"
s3_bucket = local.foo_artifact.s3_bucket
s3_key = local.foo_artifact.s3_key
# etc, etc
}
The technical details of this will vary a lot depending on your technology choices. If you don't use Terraform then you'll either use a feature similar to data sources in your other tool or you'd write some wrapper glue code that can itself retrieve the necessary information and pass it into the tool as an argument.
The main thing, regardless of technology choices, is that there is an explicit record somewhere of what is the latest artifact for each function, which is updated by your build step and read by your deploy step. This pattern can apply to other artifact types too, such as AMIs for EC2, docker images, etc.
Seems you have added label of terraform, serverless-framework (I called it sls), and aws-lambda. So all of them work for you.
terraform - Terraform itself will care of the differences which lambda need be updated. But it is not lambda friendly if you need install related packages.
serverless framework (sls) - it is good to use to manage lambda functions, but as side effect, it has to be managed with api gateway together. I am not sure if sls team has fix this issue or not. Need some confirmations.
SLS will take care of installing related packages.
The bad part is, sls can't diff the resources to be deployed and to be planned.
cloudformation - that's AWS owned Infrastructure as Code (IaC) tool to manage aws resources, you should be fine to use it to manage the lambda resource. you will get same issues as Terraform that you have to install the related packages before deploy the stack.
Bad part is, cfn (cloudformation) doesn't have diff feature as well, furtherly, it doesn't have proper tools to manage its aws cli commands, you have to use others, such as shell scriping, Ansible or even Terraform to manage coudformation templates updates.
aws cdk - The newest way is using aws-cdk, it does have the diff feature cdk diff which is mostly suitable for your current job, but it is very new project, a lot of features are still waiting to be developed.
You can take these and think with your team's skill sets. Always choice the tools, which you and your team are most confident.

What’s the best way to deploy multiple lambda functions from a single github repo onto AWS?

I have a single repository that hosts my lambda functions on github. I would like to be able to deploy the new versions whenever new logic is pushed to master.
I did a lot of reasearch and found a few different approaches, but nothing really clear. Would like to know what others feel would be the best way to go about this, and maybe some detail (if possible) into how that pipeline is setup.
Thanks
Welcome to StackOverflow. You can improve your question by reading this page.
You can setup a CI/CD pipeline using CircleCI with its GitHub integration (which is an online Service, so you don't need to maintain anything, like a Jenkins server, for example)
Upon every commit to your repository, a CircleCI build will be triggered. Once the build process is over, you can declare sls deploy, sam deploy, use Terraform or even create a script to upload the .zip file from your GitHub repo to an S3 Bucket and then, within your script, invoke the create-function command. There's an example how to deploy Serverless applications using CircleCI along with the Serverless Framework here
Other options include TravisCI, AWS Code Deploy or even maintain your own CI/CD Server. The same logic applies to all of these tools though: commit -> build -> deploy (using one of the tools you've chosen).
EDIT: After #Matt's answer, it clicked that the OP never mentioned the Serverless Framework (I, somehow, thought he was already using it, so I pointed the OP to tutorials using the Serverless Framework already). I then decided to update my answer with a few other options for serverless deployment
I know that this isn't exactly what you asked for but I use Serverless Framework (https://serverless.com) for deployment and I love it. I don't do my deployments when I push to my repo. Instead I push to my repo after I've deployed. I like this flow because a deployment can fail due to so many things and pushing to GitHub is much less likely to fail. I this way, I prevent pushing code that failed to deploy to my master branch.
I don't know if you're familiar with the framework but it is super simple. The website describes the simple steps to creating and deploy a function like this.
1 # Step 1. Install serverless globally
2 $ npm install serverless -g
3
4 # Step 2. Create a serverless function
5 $ serverless create --template hello-world
6
7 # Step 3. deploy to cloud provider
8 $ serverless deploy
9
10 # Your function is deployed!
11 $ http://xyz.amazonaws.com/hello-world
There are also a number of plugins you can use to integrate easily with custom domains on APIGateway, prune older versions of lambda functions that might be filling up your limits, etc...
Overall, I've found it to be the easiest way to manage and deploy my lambdas. Hope it helps!
Given that you're using AWS Lambda, you may want to consider CodePipeline to automate your release process. [SAM(https://docs.aws.amazon.com/lambda/latest/dg/serverless_app.html) may also be interesting.
I too had the same problem. I wanted to manage 12 lambdas with 1 git repository. I solved it by introducing travis-ci. travis-ci saved the time and really useful in many ways. We can check the logs whenever we want and you can share the logs to anyone by sharing the URL. The sample documentation of all steps can be found here. You can go through it. 👍

TFS Deployment times out, when using an existing Build

I'm trying to deploy an existing build with the "LabDefaultTemplate.11.xaml".
My problem is, that the build times out, as soon as I use an existing build. Here are the last steps including details and the timeout exception:
See http://i.stack.imgur.com/po1i6.png
I have two different servers. The first has TFS 2013 with Build Service, Controller and Agent installed on it. The second is thought to be used for Testing and has a Test-Controller and Agent on it (configured as a Standard-Environment in MS Test-Manager).
Build Service Account is a Domain-Admin
Build Connection to TFS goes with a TFS-Admin
Test Controller Service Login Account is a Local-Admin (mirrored on the Build-Server) and earlier tried with the Domain-Admin
Test Controller TFS-Connection also with a TFS-Admin
Test Controller Lab Service Account is not used, earlier also tried with the Domain-Admin
When I set the build to use the latest TFS-Build it runs into the timeout.
And when i set the path to use a Build from a specific location to the Build Directory on the Build-Server it all just works fine.
The difference between a working build and the timeout described above can be seen in this picture: http://i.stack.imgur.com/gPM07.png
Has anyone an idea where I'm struggling?
The problem was, that my previous builds only "partially succeeded" because they had some failing unit tests. The setting to use the only uses fully successful builds. So the latest build which was used had no drop folder configured.
I received no info about that, until I saw it in the logs. My fault was to never check, which build really is used as the latest.

Resources