NAT Instance maintenance - aws-lambda

I have a Django app deployed on AWS Lambda through Zappa and my app needs to communicate with the public internet, so I need to use a NAT Instance. I am using a NAT instance because it's about 10x cheaper than a NAT Gateway using the free tier. The downside is that unlike NAT Gateway, a NAT Instance needs actual maintenance, and I am unsure what type of maintenance it needs. I want to learn about things I need to do to keep my server running well and healthy.
What are things I can do to make sure of that?
Here is my AWS Architecture:
All of the following is in my VPC. I have 1 subnet in ca-central-1a and 1 in ca-central-1b. In the route table, both subnets point to my NAT Instance. I have a 3rd subnet in ca-central-1b and in the route table it points to an internet gateway. My NAT Instance is in ca-central-1b.
My NAT Instance security group NATSG has HTTP and HTTPS inbounds from both of my subnets in ca-central-1a and ca-central-1b and outbound to 0.0.0.0/0. Should I make another NAT Instance in ca-central-1a and make it only inbound from the subnet in ca-central-1a i.e 1 NAT Instance for each subnet? Would that be healthier/safer?
Extra information:
I disabled Source/dest check. Was that a good idea?
For my AMI I chose a recent community AMI amzn-ami-vpc-nat and I created an Auto Scale Group which has my NAT instance. It only has 1 instance, is there any point of the Auto Scale Group if there's only 1 instance in it? I am not sure that I am using the Auto Scale Group right, I simply created it but haven't configured anything.

Maintenance for NAT instances is necessary for security updates, security groups and instance failures.
It's not necessary to place NAT instance in every subnet. You can connect multiple instance through single NAT instance. Also it is recommended to place NAT instance in public subnet.
source/destination check is enabled by default for each EC2 instance which shows that instance must be the source or destination of traffic which it send or receive. So source/destination check must be disabled for NAT instance as NAT instance is not source or destination to send or receive the traffic. It
just act as intermediate to send traffic to the private instances.
Below link gives the detailed description of Disabling Source/Destination Checks
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html#EIP_Disable_SrcDestCheck
Setting up desired capacity to 1 will always keep your 1 NAT instance
up. But concern is when a NAT instance gets terminated, auto-scaling
group will launch the respective NAT instance which has
Source/destination 'enable' by default. We have to make it disable
manually, Also the entries which where made in route table by
selecting target as nat-instance-id will not get change and Route
Table will be pointing at the instance that was terminated. To get
SourceDestCheck attribute disabled for newly launch NAT instance you
could launch this from the User Data of the instance.Here is an example shell script.
EC2_INSTANCE_ID=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
EC2_AVAIL_ZONE=`wget -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone`
EC2_REGION=`echo \$EC2_AVAIL_ZONE\ | sed -r 's/.{2}$//'`
echo "Region:" $EC2_REGION
aws ec2 modify-instance-attribute --instance-id $EC2_INSTANCE_ID --source-dest-check "{\"Value\": false}" --region $EC2_REGION
rc=$?; if [[ $rc != 0 ]]; then echo "Failure:" $rc; exit $rc; fi
echo "Success"

Sorry, #Rony Azrak for the delayed response. As your concern is to configure instance details after launch, we assume that you are considering about updating user-data script, the possible way to do so is to run the script through a shell. Just need to save the given script in .sh file say some a.sh and execute it through command as #sh a.sh.
But this changes will only be specific to instance, it will not reflect for next upcoming instance which may get launch through autoscaling if you are using it. For this purpose, you need to create a new launch configuration with required modification by adding the script in Advanced Details section, as existing launch configuration can't be edited. This ultimately leads to launching a new instance.
About Auto scaling, we would suggest you use auto scaling which will automate your task of launching an instance. It does not incur any extra charge you pay only for resources what you use.

Related

In AWS, how do I configure SSM for an instance joined to an AWS AD Domain in a Private Subnet?

I am trying to set up SSM on Windows.
I have an ASG in a private subnet (absolutely 0 internet access). I can not use NAT, only VPC endpoints.
In the instance launch configuration, I have a PowerShell script that uses Set-DnsClientServerAddress so that the instance can find and join an AWS Managed MS AD service. I would also like to set up the instance so it can be fully managed with SSM.
The problem comes with the DNS Client Server Address.
When I set it to match the address of the AD service SSM will not work.
When I leave the DNS Client Server Address default, SSM works but I can not join the AD.
I tried forcing the SSM Agent to use the endpoints by creating a amazon-ssm-agent.json file and setting all three endpoints in there. This allowed the instance to show on the Managed Instance list, but its status never changed from pending and requests from within the instance still timed out.
Does anyone know the magic sauce to get these things all working at the same time?
EDIT 1:
I also tried adding a forward as described in this thread, however I'm either missing somethign or it is not working for my case:
https://forums.aws.amazon.com/thread.jspa?messageID=919331&#919331
It turns out that adding the forwarder as described in the link above worked. The part I was missing was joedaws comment, "I would also remove the existing 169.254.169.253 entry so that only the 10.201.0.2 ip address is in the list".
Of course, my IPs are different, but once I removed the preexisting forward so that my x.x.x.2 IP was the only one in the list (I did this for both of the AD DNS servers) the instance was discoverable by SSM.
So, I would make a minor change to the list that saugy wrote:
On a domain joined windows instance, log in with AD domain Admin user
Open DNS manager
Connect to one of the DNS IP addresses for the AWS AD
Select forwarders
Add the VPC's DNS IP (x.x.x.2 from you VPC's CIDR range)
Remove the existing IP (so you VPCs IP is the only one)
Click Apply
Repeat from step 3 with the other DNS IP address for the AWS AD (not 1
Also, as mentioned in the other post. This only has to be done once and the settings persist in the AD DNS.

Terraform setup tips: TLS communication across VPCs

I'm working for a client that has a simple enough problem:
They have EC2s in two different Regions/VPCs that are hosting microservices. Up to this point all EC2s only needed to communicate with EC2 instances that were in the same subnet, but now we need to provision our infrastructure so that specific ec2s in VPC A's public subnet can call specific ec2s in VPC B's public subnet (and vice versa). Communications would be calling restful APIs over over HTTPS/TLS 2.0
This is nothing revolutionary but IT moves slowly and I want to create a Terraform proof of concept that:
Creates two VPCs
Creates a public subnet in each
Creates an EC2 in each
Installs httpd in the EC2 along with a Cert to use SSL/TLS
Creates the proper security groups so that only IPs associated with the specific instance can call the relevant service
There is no containerization at this client, just individual EC2s for each app with 1 or 2 backups to distribute the load. I'm working with terraform so I can submit different ideas to them for consideration, such as using VPC Peering, Elastic IPs, NAT Gateways, etc.
I can see how to use Terraform to make these infrastructural changes, but I'm not sure how to create EC2s that install a server that can use a temp cert to demonstrate HTTPS traffic. I see a tech called Packer, but was also thinking I should just create a custom AMI that does this.
What would the best solution be? This doesn't have to be production-ready so I'm favoring creating a fast stable proof-of-concept.
I would use the EC2 user_data option in Terraform to install httpd and create your SSL cert. Packer is great if you want to create AMIs to spin up, but since this is an POC and you are not doing any complex configuration that would take long to perform, I would just use user_data.

Why can't I join my AWS EC2 instance to Active Directory?

I'm unable to join an EC2 instance to my Directory Services Simple AD in Amazon Web Services manually, per Amazon's documentation.
I have a Security Group attached to my instance which allows HTTP and RDP only from my IP address.
I'm entering the FQDN foo.bar.com.
I've verified that the Simple AD and the EC2 instance are in the same (public, for the moment) subnet.
DNS appears to be working (because tracert to my IP gives my company's domain name).
I cannot tracert to the Simple AD's IP address (it doesn't even hit the first hop)
I cannot tracert to anything on the Internets (same as above).
arp -a shows the IP of the Simple AD, so it appears my instance has received traffic from the Simple AD.
This is the error message I'm receiving:
The following error occurred when DNS was queried for the service
location (SRV) resource record used to locate an Active Directory
Domain Controller (AD DC) for domain "aws.bar.com":
The error was: "This operation returned because the timeout period
expired." (error code 0x000005B4 ERROR_TIMEOUT)
The query was for the SRV record for _ldap._tcp.dc._msdcs.aws.bar.com
The DNS servers used by this computer for name resolution are not
responding. This computer is configured to use DNS servers with the
following IP addresses:
10.0.1.34
Verify that this computer is connected to the network, that these are
the correct DNS server IP addresses, and that at least one of the DNS
servers is running.
The problem is the Security Group rules as currently constructed are blocking the AD traffic. Here's the key concepts:
Security Groups are whitelists, so any traffic that's not explicitly allowed is disallowed.
Security Groups are attached to each EC2 instance. Think of Security Group membership like having a copy of an identical firewall in front of each node in the group. (In contrast, Network ACLs are attached to subnets. With a Network ACL you would not have to specify allowing traffic within the subnet because traffic within the subnet does not cross the Network ACL.)
Add a rule to your Security Group which allows all traffic to flow within the subnet's CIDR block and that will fix the problem.
The question marked as the answer is incorrect.
Both of my AWS EC2 instances are in same VPC, same subnet, with same security group.
I have the same issue. Here are my inbound rules on my security group:
Here is the outbound rules:
I can also ping from the between the dc and the other host, bi-directional with replies on both side.
I also have the DC IP address set as the primary and only DNS server on the other EC2 instance.
AWS has some weird sorcery preventing a secondary EC2 instance from joining the EC2 domain controller, unless using their managed AD services which I am NOT using.
The other EC2 instance has the DC IP address set as primary DNS. And bundled with the fact I can ping each host from each other, I should have ZERO problems joining to domain.
I had a very similar problem, where at first LDAP over UDP (and before that, DNS) was failing to connect, even though the port tests were fine, resulting in the same kind of error (in network traces, communication between standalone server EC2 instance and the DC instance stopped at "CLDAP 201 searchRequest(4) "" baseObject", with nothing being returned). Did all sorts of building and rebuilding, only to find out that I was inadvertently blocking UDP traffic, which AWS needs for both LDAP and DNS. I had allowed TCP only, and the "All Open" test SG I was using was also TCP only.
D'oh!!!

Amazon - can't connect to instance behind VPC

For testing purposes, I set up a VPC on Amazon and created an instance within the VPC. I've added a gateway for the 0.0.0.0/0 address to the attached routing table, and given the instance an elastic IP address. I'm unable to ssh or ping it, even when I set the security group to allow all traffic. I must be missing something obvious. What am I doing wrong?
It turns out that when I created the instance, I accepted the default security group, which only allows access from a specific IP address. When I added another instance, I created it with a security group which allowed all traffic, and I was able to ping it.

Amazon EC2 autoscaling instances with elastic IPs

Is there any way to make new instances added to an autoscaling group associate with an elastic IP? I have a use case where the instances in my autoscale group need to be whitelisted on remote servers, so they need to have predictable IPs.
I realize there are ways to do this programmatically using the API, but I'm wondering if there's any other way. It seems like CloudFormation may be able to do this.
You can associate an Elastic IP to ASG instances using manual or scripted API calls just as you would any other instance -- however, there is no automated way to do this. ASG instances are designed to be ephemeral/disposable, and Elastic IP association goes against this philosophy.
To solve your problem re: whitelisting, you have a few options:
If the system that requires predictable source IPs is on EC2 and under your control, you can disable IP restrictions and use EC2 security groups to secure traffic instead
If the system is not under your control, you can set up a proxy server with an Elastic IP and have your ASG instances use the proxy for outbound traffic
You can use http://aws.amazon.com/vpc/ to gain complete control over instance addressing, including network egress IPs -- though this can be time consuming
There are 3 approaches I could find to doing this. Cloud Formation will just automate it but you need to understand what's going on first.
1.-As #gabrtv mentioned use VPC, this lends itself to two options.
1.1-Within a VPC use a NAT Gateway to route all traffic in and out of the Gateway. The Gateway will have an Elastic IP and internet traffic then whitelist the NAT Gateway on your server side. Look for NAT gateway on AWS documentation.
1.2-Create a Virtual Private Gateway/VPN connection to your backend servers in your datacenter and route traffic through that.
1.2.a-Create your instances within a DEDICATED private subnet.
1.2.b-Whitelist the entire subnet on your side, any request from that subnet will be allowed in.
1.2.c Make sure your routes in the Subnet are correct.
(I'm skipping 2 on purpose since that is 1.2)
3.-The LAZY way:
Utilize AWS Opsworks to do two things:
1st: Allocate a RESOURCE Pool of Elastic IPs.
2nd: Start LOAD instances on demand and AUTO assign them one elastic ip from the Pool.
For the second part you will need to have the 24/7 instances be your minimum and the Load instances be your MAX. AWS Opsworks now allows Cloud Watch alarms to trigger instance startup so it is very similar to ASG.
The only disadvantage of Opsworks is that instances aren't terminated but stopped instead when the load goes down and that you must "create" instances beforehand. Also you depend on Chef solo to initiate your instances but is the only way to get auto assigning EIPs to your newly created instances that I could find.
Cheers!

Resources