Launch EMR cluster via Lambda inside a VPC using boto3 - aws-lambda

I am trying to launch an EMR cluster using AWS Lambda code written with boto3 and python. The Lambda is able to launch the cluster when there is no VPC configuration associated it. However, as soon as I add the VPC config it fails to launch the cluster and errors out and does not provide any error message.
I am trying to launch the lambda inside a default VPC and it has 3 public subnets and a default security group. I have checked the route table in the VPC is associated with an internet gateway and it is attached to the VPC.
The execution role provides full access to the cloudwatch elasticmapreduce and ec2 actions.
Any help in resolving this school boy error will be much appreciated.

Related

Security Group update to allow AWS Lambda function that is not attached to any VPC

There are two applications. One application is developed through AWS Lambda (present in Account A) and other application is deployed in ECS Fargate (present in Account B) in AWS.
The first application (AWS Lambda) is consuming an API (from the second application ECS Fargate). I need to allow the AWS Lambda function to access the ECS application (which is behind Application Load balancer) through the inbound rule in the security group.
Problem is AWS Lambda is not attached to any VPC and both applications are running in separate AWS accounts. How to solve this problem?
Note: It is an internal application not internet facing.
Note : Its internal application not internet facing.
If your ECS application's load balancer scheme is set to internal instead of public, then an AWS Lambda function that is not assigned to a VPC would never be able to access it. You are asking about security group rules, but there is no security group rule that will give something on the Internet access to a resource that is not exposed to the Internet.
Your only option to make this work is to move the Lambda function into a VPC, and establish VPC peering between the two VPCs.

Connect timeout from AWS lambda to AWS codepipeline

I am trying to trigger code pipeline from lambda using below link got the lambda python code as well.
https://aws.amazon.com/blogs/devops/adding-custom-logic-to-aws-codepipeline-with-aws-lambda-and-amazon-cloudwatch-events/
But somehow while running i am getting exception as
Connect timeout on endpoint URL "https://codepipeline.ap-southeast-2.amazonaws.com/"
I have opened all traffic using security group attached to lambda.
Please suggest what else to check here.
Thanks
Sharad
You are running your Lambda function in a VPC (as evidenced by the fact that you said it has a security group attached). A Lambda function in a VPC cannot access anything outside the VPC without a route to a NAT Gateway. A Lambda function in a VPC never gets a public IP assigned to it, so it can never use a VPC Internet Gateway directly. Thus to access anything outside your VPC, such as the AWS API to trigger a CodePipeline run, the Lambda function needs to be deployed only in subnets of your VPC that have a route to a NAT Gateway.
The alternative would be to add a VPC Endpoint for the specific AWS Service you want to access.

VPC-enabled Lambda function cannot launch/access EC2 in the same VPC

I have a VPC enabled Lambda function which attempts to launch an EC2 using a launch template. The EC2 launch step (run_instances) fails with the below generic network error.
Calling the invoke API action failed with this message: Network Error
I can launch an instance successfully directly from the launch template, so I think everything is fine with the launch template. I have configured the following in the launch template
Amazon Machine Image ID
Instance type
Key Pair
A network interface (ENI) which I had created before using a specific (VPC, Subnet, Secutity Group) combo.
IAM role
The Lambda function includes the below code-
import json
import boto3
import time
def lambda_handler(event, context):
ec2_cl = boto3.client('ec2')
launch_temp = {"LaunchTemplateId": "<<Launch Template ID>>"}
resp_ec2_launch = ec2_cl.run_instances(MaxCount=1, MinCount=1, LaunchTemplate=launch_temp, SubnetId="<<Subnet ID>>")
Few things on the Lambda function-
I have used the subnet in the run_instances() call because this is not the default vpc/subnet.
The function is setup with the same (VPC, Subnet, Secutity Group) combo as used in the launch template
The execution role is setup to be the same IAM role as used in the launch template
The function as you see needs access only to the EC2, internet access is not needed
I replaced the run_instances() with describe_instance_status (using the instance id created directly from the launch template) and got the same error.
The error is a network error, so I assume all is fine (atleast as of now) with the privileges granted to the IAM role. I'm sure there would be a different error, if the IAM role missed any policies.
Can someone indicate what I might be missing?
It appears that the problem is with your AWS Lambda function being able to reach the Internet, since the Amazon EC2 API endpoint is on the Internet.
If a Lambda function is not attached to a VPC, it has automatic access to the Internet.
If a Lambda function is attached to a VPC and requires Internet access, then the configuration should be:
Attach the Lambda function only to private subnet(s)
Launch a NAT Gateway in a public subnet
Configure the Route Table on the private subnets to send Internet-bound traffic (0.0.0.0/0) through the NAT Gateway
It appears that your VPC does not have an Internet Gateway, but it does have a VPC Endpoint for EC2.
Therefore, to try and reproduce your situation, I did the following:
Created a new VPC with one subnet but no Internet Gateway
Added a VPC Endpoint for EC2 to the subnet
Created a Lambda function that would call DescribeInstances() and attached the Lambda function to the subnet
Opened the security group on the VPC Endpoint and Lambda function to allow all traffic from anywhere (hey, it's just a test!)
My Lambda function:
import json
import boto3
def lambda_handler(event, context):
ec2 = boto3.client('ec2',region_name='ap-southeast-2')
print(ec2.describe_instances())
The result: The Lambda function successfully received a response from EC2, with a list of instances in the region. No code or changes were required.

What is the downside of NOT running AWS Lambda functions in a VPC?

I am running AWS Lambda functions in a VPC.
And during the course of the project I have hit problems because:
no access to my database - had to solve this somehow
no access to AWS SES - had to find workaround
no access to AWS SQS -removed all queuing functionality from Lambda functions
no access to external Internet - still don't know how to implement ReCapthca
without Internet access
no access to AWS Cognito - cannot get
information about logged in users
I COULD implement a NAT gateway in the VPC but what is the point of serverless if I have to run a NAT server instance? That's not serverless.
So finally AWS has worn me down and I have decided to give up on running my AWS Lambda functions in a VPC - without endpoints for Internet proxying and the various AWS services its just too hard.
SO my question is - what is the downside/disadvantage of running my AWS Lambda functions with no VPC?
If you need access to resources within a VPC, then run your AWS Lambda function within a VPC. If you do not require this access, then do not run it within a VPC.
If you require Internet access, then you should connect your Lambda functions to a Private Subnet and use a NAT Gateway, which is a fully-managed NAT so you can remain serverless. It will solve the problems you listed.
AWS has provided a reference document for Lambda deployments: Serverless Application Lens, AWS Well-Architected Framework. In it they provide the following decision tree:
The only major downside noted is that a Lambda outside of a VPC cannot directly access private resources within a VPC.
One reason to create a Lambda in a VPC would be that you have a specific IP or IP range for it. This could be the case if a system just accepts calls from a specific IP which would need to be whitlistet for it.
Fix IP for Lambda function is discussed here: Is there a way to assign a Static IP to a AWS Lambda without VPC?
Downside of not having Lambda in VPC: Not having specific IP / IP-range for your Lambda function.
In the end I stayed with the VPC but I added an EC2 instance into the VPC and ran TinyProxy on it. I then configured my AWS Lambda functions with the environment variable:
HTTPS_PROXY https://ip-10-0-1-53.eu-west-1.compute.internal:8888
boto3 picked up the environment variable and sent all requests to the proxy. This seems to work fine without the complexity of a NAT gateway.

Getting information about deployment from within an instance of AWS Elastic Beanstalk

My specific need is to get the list of EC2 instances in the deployment from within one of the instances.
I've tried using AWS command line for example aws elb describe-load-balancers however it would just give details of all my AWS services. I know you can specify an instances name with --load-balancer-name but I just don't have access to that from within the instance automatically.
Perhaps a file can be created on instance creation by placing something in .ebextensions?
You can do it in a two step process using the AWS CLI.
First you get the endpoint for your Elastic Beanstalk application:
aws elasticbeanstalk describe-environments --query='Environments[?ApplicationName==`Your-application-name`].EndpointURL'
Then you use the endpoint to get the instances:
aws elb describe-load-balancers --query='LoadBalancerDescriptions[?DNSName==`load-balancer-end-point-from-previous-step`].Instances[0]'

Resources