Lambda in VPC deletion takes more time - aws-lambda

I have created a stack that lambda in VPC using cloud formation. When I try to delete the entire stack, it takes 40-45 minutes of time.
My Iam Role has the following permission:
- ec2:DescribeInstances
- ec2:CreateNetworkInterface
- ec2:AttachNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
- ec2:DetachNetworkInterface
- ec2:ModifyNetworkInterfaceAttribute
- ec2:ResetNetworkInterfaceAttribute
- autoscaling:CompleteLifecycleAction
- iam:CreateRole
- iam:CreatePolicy
- iam:AttachRolePolicy
- iam:PassRole
- lambda:GetFunction
- lambda:ListFunctions
- lambda:CreateFunction
- lambda:DeleteFunction
- lambda:InvokeFunction
- lambda:GetFunctionConfiguration
- lambda:UpdateFunctionConfiguration
- lambda:UpdateFunctionCode
- lambda:CreateAlias
- lambda:UpdateAlias
- lambda:GetAlias
- lambda:ListAliases
- lambda:ListVersionsByFunction
- logs:FilterLogEvents
- cloudwatch:GetMetricStatistics
How to improve the deletion time of the stack?

When a Lambda function executes within your VPC, an Elastic Network Interface (ENI) is created in order to give it network access. You can think of an ENI as a virtual NIC. It has a MAC address and at least one private IP address, and is "plugged into" any resource that connects to the VPC network and has an IP address inside the VPC (EC2 instances, RDS instances, ELB, ALB, NLB, EFS, etc.).
While it does not appear to be explicitly documented, these interfaces as used by Lambda appear to be mapped 1:1 to container instances, each of which hosts one or more containers, depending on the size of each container's memory allocation. The algorithm Lambda uses for provisioning these machines is not documented, but there is a documented formula for approximating the number that Lambda will create:
You can use the following formula to approximately determine the ENI requirements.
Projected peak concurrent executions * (Memory in GB / 3GB)
This formula suggests that you will see more ENIs if you have either high concurrency or large memory footprints, or fewer ENIs if neither of those conditions is true. (The reason for 3GB boundary seems to be based on the smallest instance Lambda appears to use, in the background, which is the m3.medium general purpose EC2 instance. You can't see these among your EC2 instances, and you are not billed for them.)
In any event, Lambda doesn't shut down containers or their host instances immediately after function execution because it might need them for reuse on subsequent invocations, and since containers (and their host instances) are not destroyed right away, neither are their associated ENIs. To do so would be inefficient. In any event, the delay is documented:
There is a delay between the time your Lambda function executes and ENI deletion.
This makes sense, when we consider that the Lambda infrastructure's priorities should be focused on making resources available as needed and keeping them available for quick access performance reasons -- so tearing things down again is a secondary consideration that the service attends to in the background.
In short, this delay is normal and expected.
Presumably, CloudFormation has used tags to identify these interfaces, since it isn't readily apparent how to otherwise distinguish among them.
ENIs are visible in the EC2 console's left hand navigation pane under Network Interfaces, so it's possible that you could delete these yourself and hasten the process... but note that this action, assuming the system allows it, needs to be undertaken with due caution -- because if you delete an ENI that is attached to a container instance that Lambda subsequently tries to use, Lambda will not know that the interface is missing and the function will time out or throw an error at least until Lambda decides to destroy the attached container instance.


What is the maximum outbound connections I can create from AWS Lambda?

I am looking at the documentation on Lamba Limits which says:
Number of file descriptors 1,024
I am wondering if this is per invoking lambda or total across all lambdas?
I am processing a very large number of items from a kinesis stream and I am calling a web endpoint and it I seem to be hitting a bottle neck of about 1024 concurrent connections to the API and I'm not sure where the bottleneck is. I'm investigating limits on my load balancer and instances but I'm also wondering if lambda itself simply cannot create more than 1024 concurrent outbound connections across all lambdas?
This question is old, but a suitable answer may help others in the future. The limit as correctly noted in the question is 1,024 outbound connections per Lambda function. However this limit is only for the life cycle of the container. There are currently no public documents stating the length of the life cycle, however through my own testing it resulted in the following:
A new container is created after 5 minutes of idle time for the Lambda function
A new container is created after 60 minutes of frequent use of the Lambda function
A new container is created on any update to the code or configuration of the Lambda
A final note on the new containers, when a new container is created it will run all of your code from the start whereas invoking a warm container will just invoke the handler, skipping the loading of the libraries etc. As this is the case it is a best practice to implement connection pooling and declare the connection outside of the handler so that it can be reused in subsequent invokes, examples of this can be found in the AWS docs

Using reserved instances in an Elastic Beanstalk Load Balancer

I am running an Elastic Beanstalk load-balanced application for a year now. I'm looking for ways to cut back on costs and have discovered I could potentially use reserved ec2 instances instead of the On-Demand instances we are currently using. Currently, my load balancer uses two instances.
I want to make the switch but am unsure about how the process is actually done. I want everything to be crystal clear before doing anything.
From my understanding, if I reserve two of the same type of instance as used in my App, (t2.large with Linux) for the same availability zones (1 in eu-west1b, another in eu-west1c) I could use these instances for the load balancer. Will the same-type instances I currently have deployed immediately fall under rates of a reserved instance? Will I have to rebuild my environment and and build two new instances that match the reserved ones?
A Reserved Instance a method of pre-paying for Amazon EC2 capacity.
If you were to buy two Reserved Instances (in your case, 2 x t2.large Linux), then for every hour of the year while the Reserved Instance is valid you will be entitled to run the matching instance types (2xt2.large Linux) at no hourly charged.
There is no need to identify which instance is a Reserved Instance. Rather, the billing system will pick a matching instance that is running each hour and will not bill any hourly charges.
Therefore, if these are the only matching instances you are running, then they will (by default) be identified as Reserved Instances and will not receive hourly charges. If you run other instances, however, there is no way to control which instance(s) receive the pricing benefit.
It is possible to purchase a Reserved Instance with, or without, identifying the Availability Zone. If an AZ is selected, then the pricing benefit of the Reserved Instance only matches an instance running in that AZ, and there is also a capacity reservation to give you priority when running instances that match the Reserved Instance. If no AZ is selected, then the pricing benefit applies across any instances running in that region, but there is no capacity reservation.
Bottom line: Yes, it will apply immediately (for the number of instances for which you have purchased Reserved Instances). There is no need to start/stop/rebuild anything.
For anyone looking for a bit more certainty than John's (correct) answer, here's the official AWS docs on the subject:
In this scenario, you have a running On-Demand Instance (T2) in your account, for which you're currently paying On-Demand rates. You purchase a Reserved Instance that matches the attributes of your running instance, and the billing benefit is immediately applied. Next, you purchase a Reserved Instance for a C4 instance. You do not have any running instances in your account that match the attributes of this Reserved Instance. In the final step, you launch an instance that matches the attributes of the C4 Reserved Instance, and the billing benefit is immediately applied.
From here:

Spot instances termination

I'm planning to start using Amazon EC2, and, as everyone, I want to use Spot instances.
Will be for a minigames server, so Spot instances are perfect for this. Players enter, play the match and leave, so when a Spot instance finishes because of spot instance price volatility only current match will be finished, barely any data loss and perfectly acceptable when you save a lot of money.
Now, altough players are going to be disconnected and connected to an ondemand server when volatility reaches maximum bid, I would like to know if when a Spot instance is force-terminated is called the normal shutdown command or simply is "unplugged" and I don't have a chance to disconnect players safely and save their data to the database (this will take just a few milliseconds).
As of 2015, Amazon now provides a 2-minute termination notice in the instance metadata.
A custom script can be written to poll for the termination notice and call web server graceful shutdown and associated cleanup scripts to ensure zero impact to end-users.

Amazon EC2 Spot Alert

I use 1 spot instance and would like to be emailed when prices for my instance size and region are above a threshold. I can then take appropriate action and shut down and move instance to another region if needed. Any ideas on how to be alerted to the prices?
There's two ways to go about this that I can think of:
1) Since you only have one instance, you could set a CloudWatch alarm for your instance in a region that will notify you when the spot price rises above what you're willing to pay hourly.
If you create an Alarm, and tell it to use the EstimatedCharges metric for the AmazonEC2 service, and choose a period of an hour, then you are basically telling CloudWatch to send you an email whenever the hourly spot price for your instance in the region it's running in is above your threshold for wanting to pay.
Once you get the email, you can then shut the instance down and start one up in another region, and leave it running with its own alarm.
2) You could automate the whole process with a client program that polls for changes in the spot price for your instance size in your desired regions.
This has the advantage that you could go one step further and use the same program to trigger instance shutdowns when the price rises and start another instance in a different region.
Amazon recently released a sample program to detect changes in spot prices by region and instance type: How to Track Spot Instance Activity with the Spot-Notifications Sample Application.
Simply combine that with the ec2 command-line tools to stop and start instances and you don't need to manually do it yourself.

Azure scalability over XML File

What is the best practise solution for programmaticaly changing the XML file where the number of instances are definied ? I know that this is somehow possible with this csmanage.exe for the Windows Azure API.
How can i measure which Worker Role VMs are actually working? I asked this question on MSDN Community forums as well:
To modify the configuration, you might want to look at the PowerShell Azure Cmdlets. This really simplifies the task. For instance, here's a PowerShell snippet to increase the instance count of 'WebRole1' in Production by 1:
$cert = Get-Item cert:\CurrentUser\My\<YourCertThumbprint>
$sub = "<YourAzureSubscriptionId>"
$servicename = '<YourAzureServiceName>'
Get-HostedService $servicename -Certificate $cert -SubscriptionId $sub |
Get-Deployment -Slot Production |
Set-DeploymentConfiguration {$_.RolesConfiguration["WebRole1"].InstanceCount += 1}
Now, as far as actually monitoring system load and throughput: You'll need a combination of Azure API calls and performance counter data. For instance: you can request the number of messages currently in an Azure Queue:
You can also set up your role to capture specific performance counters. For example:
public override bool OnStart()
var diagObj= DiagnosticMonitor.GetDefaultInitialConfiguration();
AddPerfCounter(diagObj,#"\Processor(*)\% Processor Time",60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Request Execution Time", 60.0);
AddPerfCounter(diagObj,#"\ASP.NET Applications(*)\Requests Executing", 60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Requests/Sec", 60.0);
//Set the service to transfer logs every minute to the storage account
diagObj.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);
//Start Diagnostics Monitor with the new storage account configuration
So this code captures a few performance counters into local storage on each role instance, then every minute those values are transferred to table storage.
The trick, now, is to retrieve those values, parse them, evaluate them, and then tweak your role instances accordingly. The Azure API will let you easily pull the perf counters from table storage. However, parsing and evaluating will take some time to build out.
Which leads me to my suggestion that you look at the Azure Dynamic Scaling Example on the MSDN code site. This is a great sample that provides:
A demo line-of-business app hosting a wcf service
A load-generation tool that pushes messages to the service at a rate you specify
A load-monitoring web UI
A scaling engine that can either be run locally or in an Azure role.
It's that last item you want to take a careful look at. Based on thresholds, it compares your performance counter data, as well as queue-length data, to those thresholds. Based on the comparisons, it then scales your instances up or down accordingly.
Even if you end up not using this engine, you can see how data is grabbed from table storage, massaged, and used for driving instance changes.
Quantifying the load is actually very application specific - particularly when thinking through the Worker Roles. For example, if you are doing a large parallel processing application, the expected/hoped for behavior would be 100% CPU utilization across the board and the 'scale decision' may be based on whether or not the work queue is growing or shrinking.
Further complicating the decision is the lag time for the various steps - increasing the Role Instance Count, joining the Load Balancer, and/or dropping from the load balancer. It is very easy to get into a situation where you are "chasing" the curve, constantly churning up and down.
As to your specific question about specific VMs, since all VMs in a Role definition are identical, measuring a single VM (unless the deployment starts with VM count 1) should not really tell you much - all VMs are sitting behind a load balancer and/or are pulling from the same queue. Any variance should be transitory.
My recommendation would be to pick something that is not inherently highly variable to monitor (e.g. CPU). Generally, you want to find a trending point - for web apps it may be the response queue, for parallel apps it may be azure queue depth, etc. but for either they would be the trend and not the absolute number. I would also suggest measuring them at fairly broad intervals - minutes, not seconds. If you have a load you need to respond to in seconds, then realistically you will need to increase your running instance count ahead of time.
With regard to your first question, you can also use the Autoscaling Application Block to dynamically change instance counts based on a set of predefined rules.
