2 lambdas and one VPC - only one can access the internet/DynamoDB - aws-lambda

I'm running into a very weird problem with AWS lambdas. I've created two lambdas, A and B. Both are configured the exact same way. Both need to access the Internet and DynamoDB. I'm trying to move them into a new VPC. The lambdas are in subnets because I'm using DAX.
The old VPC has 2 private subnets split /24. The new one has 3 private subnets split /20. Both VPCs have a public subnet split /24 and /20 respectively. There is a NAT attached to the subnets in the route table for the private subnets, and an IGW attached to the public subnets.
Security groups for both have (Inbound) all traffic with the source security group, and a custom TCP Rule for 8111, and (Outbound) 0.0.0.0/0.
Moving lambda A to the new VPC works fine. It can access both the Internet and DynamoDB. That tells me that the NAT and IGW are configured correctly (I think).
Moving lambda B to the new VPC fails. It can't access either the Internet or DynamoDB.
Moving either lambda back to the old VPC works. The cluster endpoint for DAX isn't hardcoded in either lambda, so there isn't an issue where I'm not changing the code correctly.
Moving a lambda entails changing the VPC, the subnets, and security groups to match the correct VPC. And changing the DAX cluster endpoint.
Both the old and new VPCs have endpoints set up for DynamoDB for both the public and private subnets.
Any thoughts on what I should be looking at?

Related

Lambda function times out trying to connect to RDS if in VPC, but doesn't if outside VPC

I have a single AWS lambda function that connects to a single AWS RDS Postgres db and simply returns a json list of all records in the db.
If I don't assign a VPC to the lambda function, it is able to access the AWS RDS db. However, if I assign a VPC to the lambda function it can no longer access the db.
The VPC is the same for both the lambda function and the RDS db. I've also opened all traffic on port 0.0.0.0/0 for inbound and outbound connections temporarily to find the issue, but I am still unable to connect.
I believe it might be a role permission related to VPC for the lambda function, but I've already assigned the policy AmazonVPCFullAccess to the lambda role.
The fact that the lambda can access the DB when not in a VPC is a bit troubling in the sense that the DB is then probably public.
A common mistake that often happens is that lambda is deployed to a public subnet. Lambda's only get assigned private IP addresses in a VPC. When deployed to a public subnets, it's only route to the internet is the internet gateway. That doesn't really work well if the lambda itself has a private ip address (the internet couldn't route traffic back to you :P).
One part of the solution is to make sure your lambda is deployed to a private subnet instead with a route to a NAT gateway if it needs access to public resources.
However, the better part of the solution is actually put the database in the private subnet WITHOUT a public IP adresss.
Because I've seen many mistakes with this with my customers, and because it can't be stressed enough: I'd strongly suggest you follow a three-tier networking model with your VPC's. This basically means:
Don't use the default VPC. Create your own.
Create 9 subnets:
3 public
3 private. Put your private lambda's here.
3 isolated. Put your database here.
There are lot's of articles / templates available that do this for you. A quick google search gives me
https://github.com/aws-samples/vpc-multi-tier
https://www.wellarchitectedlabs.com/reliability/100_labs/100_deploy_cloudformation/1_deploy_vpc/

What it means when AWS Lambda is configured with VPC?

Even after going through the AWS documentation and various blogs, I still don't understand how AWS lambda would behave when it is configured with VPC.
When AWS lambda configured with VPC, does that mean all instances of lambda would get the IP address from the specified subnet of that VPC?
How the ENI plays the role in AWS Lambda-VPC configuration?
The formula for ENI capacity from AWS doc -
Projected peak concurrent executions * (Memory in GB / 3GB)
Does it means AWS lambda’s running instance used to have 3 GB memory? And when that exceeds another ENI would get attached?
Most the AWS Lambda-VPC configuration related architecture diagrams shows Lambda inside VPC. Does that means Lambda would run inside VPC?
Here, I’m sure I’m missing a few pieces of information. Any pointers would be helpful.
When you configure a Lambda function to run in the VPC it uses an ENI that is created with and IP address in one of the subnets you select. Based on the formula of expected ENIs needed it seems that ENIs can be shared between lambdas.
There are only two reasons that I know of for running your lambda in a VPC.
It needs to access resources inside your VPC that do not have a public endpoint, e.g. Redis/Memcached caching clusters (Elasticache) or an RDS/Redshift cluster that doesn't have a public ip (good idea to not have public ip's on databases). When you lambda runs inside the VPC it uses a private ip and can connect to the private resources in your VPC
If you need to have your lambda's have a consistent IP address (perhaps a service that only allows whitelisting of IPs for authentication). This is achieved by using a NAT gateway.
Lambda functions cannot received inbound connections in any case.
Disadvantages of putting your lambda in a VPC are
Slower cold start times since a ENI might need to be provisioned.
You need a NAT gateway (or VPC endpoint) to access external resources
Needing to manage concurrency and available ip addresses more closely.

AWS Lambda times out connecting to RedShift

My Redshift cluster is in a private VPC. I've written the following AWS Lamba in Node.js which should connect to Redshift (dressed down for this question):
'use strict';
console.log('Loading function');
const pg = require('pg');
exports.handler = (event, context, callback) => {
var client = new pg.Client({
user: 'myuser',
database: 'mydatabase',
password: 'mypassword',
port: 5439,
host: 'myhost.eu-west-1.redshift.amazonaws.com'
});
// connect to our database
console.log('Connecting...');
client.connect(function (err) {
if (err) throw err;
console.log('CONNECTED!!!');
});
};
I keep getting Task timed out after 60.00 seconds unfortunately. I see in the logs "Connecting...", but never "CONNECTED!!!".
Steps I've taken so far to get this to work:
As per Connect Lambda to Redshift in Different Availability Zones I have the Redshift cluster and the Lamba function in the same VPC
Also Redshift cluster and the Lamba function are on the same subnet
The Redshift cluster and the Lamba function share the same security group
Added an inbound rule at the security group of the Redshift cluster as per the suggestion here (https://github.com/awslabs/aws-lambda-redshift-loader/issues/86)
The IAM role associated with the Lamba Function has the following policies: AmazonDMSRedshiftS3Role, AmazonRedshiftFullAccess, AWSLambdaBasicExecutionRole, AWSLambdaVPCAccessExecutionRole, AWSLambdaENIManagementAccess scrambled together from this source: http://docs.aws.amazon.com/lambda/latest/dg/vpc.html (I realize I have some overlap here, but figured that it shouldn't matter)
Added Elastic IP to the Inbound rules of the Security Group as per an answer from a question listed prior (even if I don't even have a NAT gateway configured in the subnet)
I don't have Enhanced VPC Routing enabled because I figured that I don't need it.
Even tried it by adding the Inbound rule 0.0.0.0/0 ALL types, ALL protocols, ALL ports in the Security Group (following this question: Accessing Redshift from Lambda - Avoiding the 0.0.0.0/0 Security Group). But same issue!
So, does anyone have any suggestions as to what I should check?
*I should add that I am not a network expert, so perhaps I've made a mistake somewhere.
The timeout is probably because your lambda in VPC cannot access Internet in order to connect to your cluster(you seem to be using the public hostname to connect). Your connection options depend on your cluster configuration. Since both your lambda function and cluster are in the same VPC, you should use the private IP of your cluster to connect to it. In your case, I think simply using the private IP should solve your problem.
Depending on whether your cluster is publicly accessible, there are some points to keep in mind.
If your cluster is configured to NOT be publicly accessible, you can use the private IP to connect to the cluster from your lambda running in a VPC and it should work.
If you have a publicly accessible cluster in a VPC, and you want to
connect to it by using the private IP address from within the VPC, make sure the following VPC parameters to true/yes:
DNS resolution
DNS hostnames
The steps to verify/change these settings are given here.
If you do not set these parameters to true, connections from within VPC will resolve to the EIP instead of the private IP and your lambda won't be able to connect without having Internet access(which will need a NAT gateway or a NAT instance).
Also, an important note from the documentation here.
If you have an existing publicly accessible cluster in a VPC,
connections from within the VPC will continue to use the EIP to
connect to the cluster even with those parameters set until you resize
the cluster. Any new clusters will follow the new behavior of using
the private IP address when connecting to the publicly accessible
cluster from within the same VPC.
My issues got resolved after adding the CIDR range of the VPC to the Redshift Inbound rules.
For the ones that are trying to move to redshift serverless due to it's recent release to the public... this may be a commom issue but at least for me the answer from #pcothenet worked:
For what it's worth, I had a similar issue. My problem was that I had
set the lambda to have access to my public subnets only. My public
subnet is routing all outbound traffic to an internet gateway, while
my private subnets are routing outbound traffic via an NAT Gateway.
But according to the doc "You cannot use an Internet gateway attached
to your VPC, since that requires the ENI to have public IP addresses."
Switching the lambda to the private subnets (and therefore using the
NAT Gateway) solved the problem. – pcothenet
You must use the Endpoint to connect.
Best.
I had this same issue and followed the steps above and I found that in my case the issue was that the lambda was in a subnet that did not have a route to the NAT gateway. So I moved the lambda into a subnet with route to the NAT gateway.

VPC EC2 -> Multi-AZ RDS Unable to connect

I have a VPC with a mix of public and private subnets. I wanted DB server, among other things, in the private subnet. For now I have my web servers in 2 public subnets. I spun up a multi-AZ VPC RDS instance into a subnet group that contains 3 dedicated private subnets, each in its own AZ.
Here's the issue. I can connect from one of my public EC2 instances to RDS, but not from the other. I was sharing security groups and ACLs for those two public subnets, so that shouldn't be the issue. As best I can tell, the only thing that was different was that the public EC2 that could connect was in the same AZ as the RDS primary node, while the other EC2 instance was in a different AZ (the same one as the RDS failover). When I ping to the RDS domain name from the non-working public EC2, it resolves the private IP just fine, so that doesn't seem to be the issue...it's as if it just isn't routing correctly. Any ideas?
EDIT: I also tried making the private subnets public by updating the route table, that didn't work either. It really seems to be related to the different AZ.
Well, in the end I terminated the instance, started a new one in the EXACT SAME AZ that I was in before based on the exact same AMI that I had based the previous instance off of, made basically no changes, and it all of the sudden it worked as expected. Go figure.

Amazon ELB in VPC

We're using Amazon EC2, and we want to put an ELB (load balancer) to 2 instances on a private subnet. If we just add the private subnet to the ELB, it will not get any connections, if we attach both subnets to the ELB then it can access the instances, but it often will get time-outs. Has anyone successfully implemented an ELB within the private subnet of their VPC? If so, could you perhaps explain the procedure to me?
Thanks
My teammate and I just have implemented ELB in a VPC with 2 private subnets in different availability zones. The reason you get timeouts is that for each subnet you add to the load balancer, it gets one external IP address. (try 'dig elb-dns-name-here' and you will see several IP addresses). If one of these IP address maps a private subnet, it will timeout. The IP that maps into your public subnet will work. Because DNS may give you any one of the IP addresses, sometimes it works, sometimes it times out.
After some back and forth with amazon, we discovered that the ELB should only be placed in 'public' subnets, that is subnets that have a route out to the Internet Gateway. We wanted to keep our web servers in our private subnets but allow the ELB to talk to them. To solve this, we had to ensure that we had a corresponding public subnet for each availability zone in which we had private subnets. We then added to the ELB, the public subnets for each availability zone.
At first, this didn't seem to work, but after trying everything, we recreated the ELB and everything worked as it should. I think this is a bug, or the ELB was just in an odd state from so many changes.
Here is more or less what we did:
WebServer-1 is running in PrivateSubnet-1 in availability zone us-east-1b with security group called web-server.
WebServer-2 is running in PrivateSubnet-2 in availability zone us-east-1c with security group called web-server.
Created a public subnet in zone us-east-1b, we'll call it PublicSubnet-1. We ensured that we associated the routing table that includes the route to the Internet Gateway (ig-xxxxx) with this new subnet. (If you used the wizard to create a public/private VPC, this route already exists.)
Created a public subnet in zone us-east-1c, we'll call it PublicSubnet-2. We ensured that we associated the routing table that includes the route to the Internet Gateway (ig-xxxxx) with this new subnet. (If you used the wizard to create a public/private VPC, this route already exists.)
Created a new ELB, adding to it PublicSubnet-1 and PublicSubnet-2 (not the PrivateSubnet-X). Also, picked the instances to run in the ELB, in this case WebServer-1 and WebServer-2. Made sure to assign a security group that allows incoming port 80 and 443. Lets call this group elb-group.
In the web-server group, allow traffic from port 80 and 443 from the elb-group.
The key here is understanding, that you are not "Adding subnets/availability zones" to ELB, but rather specifying what subnets to put ELB instances into.
Yes, ELB is a software load balancer and when you create ELB object, a custom loadbalancing EC2 instance is put into the all subnets that you specified. So for the ELB (its instances) to be accessible, they have to be put into the subnets that have default route configured via IGW (most likely you classified these subnets as public).
So as already was answered above, you have to specify "public" networks for ELB, and those networks should be from the AZs where your EC2 instances are running. In this case ELB instances will be able to reach your EC2 instances (as long as security groups are configured correctly)
We've implemented ELB in a private subnet so the statement that all ELB's need to be public isn't completely true. You do need a NAT. Create a private subnet for the private ELB's, turn on VPC DNS and then make sure the private routing table is configured to go through the NAT. The subnet security groups also need to be setup to allow traffic between ELB and App, and App to DB subnets.
Beanstalk health checks won't work as they can't reach the load balancer, but for services that need to be outside of the public reach this is a good compromise.
Suggested reading to get your VPC architecture started: http://blog.controlgroup.com/2013/10/14/guided-creation-of-cloudformation-templates-for-vpc/.
You must add the following settings.
Public subnet zone b = Server NAT
Private subnet zone c = Server Web
Public subnet zone c = ELB
The trick is routing:
The router to NAT is attach with gateway A.
The router to Server Web is attach to NAT.
The router to Public subnet is attach with gateway A.
ELB details:
1.Zone: Public subnet zone c
2.Instance: Server Web
3.Security Groups: enable ports
http://docs.amazonaws.cn/en_us/ElasticLoadBalancing/latest/DeveloperGuide/UserScenariosForVPC.html
Adding a diagram to Nathan's answer. Full medium post here: https://nav7neeet.medium.com/load-balance-traffic-to-private-ec2-instances-cb07058549fd

Resources