Terraform starting EC2 sometimes stuck on "Still creating" until timeout - amazon-ec2

I am running a terraform through Jenkins which starts up an ec2 then runs a shell script on it using user_data. I run this job 23 times in parallel, and for some reason each time only a few of them (anywhere from 1 to 8 and always different indices) will hang on "aws_instance.genomic-etl-ec2: Still creating..." until the connection times out after approximately an hour and throws a RequestExpired error, with no further details on why. The other instances start fine within around 2-3 minutes each.
My resource:
data "template_file" "my-user_data" {
template = file("scripts/my_script.sh")
}
data "template_cloudinit_config" "my-user-data" {
gzip = true
base64_encode = true
# user_data
part {
content_type = "text/x-shellscript"
content = data.template_file.my-user_data.rendered
}
}
resource "aws_instance" "genomic-etl-ec2" {
ami = var.ami-id
instance_type = "m5.12xlarge"
associate_public_ip_address = true
subnet_id = var.my-subnet-us-east-id
iam_instance_profile = "my-deployment-profile"
user_data = data.template_cloudinit_config.my-user-data.rendered
vpc_security_group_ids = [
aws_security_group.my-sg1.id,
aws_security_group.my-sg2.id
]
root_block_device {
delete_on_termination = true
encrypted = true
volume_size = 1000
}
provisioner "local-exec" {
command = "sleep 40"
}
tags = {
Owner = "Me"
Environment = "development"
Name = "My EC2 - ${id}"
automaticPatches = "1"
}
}

Sometimes AWS instances take a long time to become fully available. It's not uncommon for those to take longer than Terraform's default timeout, causing Terraform to fail.
As per the official documentation on the Terraform aws_instance resource, the create timeout defaults to 10 minutes. If a particular instance type is taking longer than 10 minutes to become available, then you need to increase the create timeout setting:
resource "aws_instance" "genomic-etl-ec2" {
# ...
timeouts {
create = "20m"
}
}

Related

How to uploa local file to the ec2 instance with the module terraform-aws-modules/ec2-instance/aws?

How to upload local file to the ec2 instance with the module terraform-aws-modules/ec2-instance/aws?
I placed provisioner inside module "ec2". It does not work.
I placed provisioner outsite of the module "ec2". It does not work either.
I got the error: "Blocks of type "provisioner" are not expected here".
"provisioner" is inside module "ec2". It does not work.
module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "4.1.4"
name = var.ec2_name
ami = var.ami
instance_type = var.instance_type
availability_zone = var.availability_zone
subnet_id = data.terraform_remote_state.vpc.outputs.public_subnets[0]
vpc_security_group_ids = [aws_security_group.sg_WebServerSG.id]
associate_public_ip_address = true
key_name = var.key_name
provisioner "file" {
source = "./foo.txt"
destination = "/home/ec2-user/foo.txt"
connection {
type = "ssh"
user = "ec2-user"
private_key = "${file("./keys.pem")}"
host = module.ec2.public_dns
}
}
}
"provisioner" is outsite of the module "ec2". It does not work.
module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "4.1.4"
name = var.ec2_name
ami = var.ami
instance_type = var.instance_type
availability_zone = var.availability_zone
subnet_id = data.terraform_remote_state.vpc.outputs.public_subnets[0]
vpc_security_group_ids = [aws_security_group.sg_WebServerSG.id]
associate_public_ip_address = true
key_name = var.key_name
}
provisioner "file" {
source = "./foo.txt"
destination = "/home/ec2-user/foo.txt"
connection {
type = "ssh"
user = "ec2-user"
private_key = "${file("./keys.pem")}"
host = module.ec2.public_dns
}
}
You can use a null ressource to make it work!
resource "null_resource" "this" {
provisioner "file" {
source = "./foo.txt"
destination = "/home/ec2-user/foo.txt"
connection {
type = "ssh"
user = "ec2-user"
private_key = "${file("./keys.pem")}"
host = module.ec2.public_dns
}
}
You can provision files on an EC2 instance with the YAML cloud-init syntax which is passed to the EC2 instance as user-data. Here is an example of passing cloud-init config to EC2.
cloud-init.yaml file:
#cloud-config
# vim: syntax=yaml
#
# This is the configuration syntax that the write_files module
# will know how to understand. Encoding can be given b64 or gzip or (gz+b64).
# The content will be decoded accordingly and then written to the path that is
# provided.
#
# Note: Content strings here are truncated for example purposes.
write_files:
- content: |
# Your TXT file content...
# goes here
path: /home/ec2-user/foo.txt
owner: ec2-user:ec2-user
permissions: '0644'
Terraform file:
module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "4.1.4"
name = var.ec2_name
ami = var.ami
instance_type = var.instance_type
availability_zone = var.availability_zone
subnet_id = data.terraform_remote_state.vpc.outputs.public_subnets[0]
vpc_security_group_ids = [aws_security_group.sg_WebServerSG.id]
associate_public_ip_address = true
key_name = var.key_name
user_data = file("./cloud-init.yaml")
}
The benefits of this approach over the approach in the accepted answer are:
This method creates the file immediately at instance creation, instead of having to wait for the instance to come up first. The null-provisioner/SSH connection method has to wait for the EC2 instance to be become available, and the timing of that could cause your Terraform workflow to become flaky.
This method doesn't require the EC2 instance to be reachable from your local computer that is running Terraform. You could be deploying the EC2 instance to a private subnet behind a load balancer, which would prevent the null-provisioner/SSH connect method from working.
This doesn't require you to have the SSH key for the EC2 instance available on your local computer. You might want to only allow AWS SSM connect to your EC2 instance, to keep it more secure than allowing SSH directly from the Internet, and that would prevent the null-provisioner/SSH connect method from working. Further, storing or referencing an SSH private key in your Terraform state adds a risk factor to your overall security profile.
This doesn't require the use of a null_resource provisioner, which the Terraform documentation states:
Important: Use provisioners as a last resort. There are better alternatives for most situations. Refer to Declaring Provisioners for more details.

Terraform Throwing 'InvalidParameterValue' Address x.x.x.x does not fal within the subnet's address range.. it does

Having an issue with a Terraform deployment. I have a module that creates a network, and then another module that creates a series of ec2 instances. These servers are required to have specific IP addresses, which are called out in the module (I would rather dynamically set these but for now they are 'hardcoded'). However, I am getting a warning that the IP address I am associating with the ec2 instance 'does not fall within the subnet's address range', but it is. Here is the basic breakdown:
servers
->main.tf
->variables.tf
->outputs.tf
network
->main.tf
->variables.tf
->outputs.tf
main.tf
The relevant bits are as follows:
network main.tf
# Create VPC
resource "aws_vpc" "foo" {
cidr_block = "192.168.1.0/24"
enable_dns_hostnames = "true"
enable_dns_support = "true"
tags = {
Name = "foo"
}
}
# Create a Subnet
resource "aws_subnet" "subnet-1" {
vpc_id = aws_vpc.foo.id
cidr_block = "192.168.1.0/24"
availability_zone = "ca-central-1a"
tags = {
Name = "subnet-1"
}
}
servers main.tf
resource "aws_instance" "bar" {
ami = var.some_ami
instance_type = "t3.medium"
associate_public_ip_address = true
private_ip = "192.168.1.15"
# root disk
root_block_device {
volume_size = "60"
volume_type = "gp3"
encrypted = true
delete_on_termination = true
}
tags = {
Name = "bar"
}
}
main.tf
module "network" {
source = "./network"
}
module "servers" {
source = "./servers"
subnet_id = module.network.aws_subnet
}
Everything works correctly, and I verified in AWS that the VPC is created, and the subnet is created, but for some reason when the server is getting created I get the following error:
│ Error: creating EC2 Instance: InvalidParameterValue: Address 192.168.1.15 does not fall within the subnet's address range status code: 400
I left out some of the irrelevant bits of the tfs but everything else works as expected except this one thing. Anyone know whats going on?
Your aws_instance resource does not have subnet_id attribute. So instances are being launched in default subnet.
Add subnet_id attribute as below
resource "aws_instance" "bar" {
ami = var.some_ami
instance_type = "t3.medium"
associate_public_ip_address = true
subnet_id = "your_subnet_id"
private_ip = "192.168.1.15"
# root disk
root_block_device {
volume_size = "60"
volume_type = "gp3"
encrypted = true
delete_on_termination = true
}
tags = {
Name = "bar"
}
}
You could also use data resource to get the subnet id.
data "aws_subnet" "selected" {
filter {
name = "tag:Name"
values = ["myawesomesubnet"]
}
}

Terraform: Cannot create spot instance. Error: MaxSpotInstanceCountExceeded

I am trying to create a spot instance in Terraform and the terraform code appears to be fine but I keep getting an error back saying MaxSpotInstanceCountExceeded.
NOTE: Right now this is just a test hence I am not including security groups, IPs, etc etc.
Steps I have taken:
Checked that I have 0 spot instance requests created in console.
Tried logging in to the console and creating a spot instance request. It works just fine.
Cancelled the spot instance request to ensure that I now have 0 spot instance requests.
Now I try and create virtually the same spot instance with the terraform script below, but I get the error: MaxSpotInstanceCountExceeded
Does anyone know why Terraform (or maybe AWS?) is not allowing me to create the spot instance using the terraform script, but it works just fine from the console?
Thanks!
provider "aws" {
profile = "terraform_enterprise_user"
region = "us-east-2"
}
resource "aws_spot_instance_request" "MySpotInstance" {
# Spot Request Settings
wait_for_fulfillment = "true"
spot_type = "persistent"
instance_interruption_behaviour = "stop"
# Instance Settings
ami = "ami-0520e698dd500b1d1"
instance_type = "c4.large"
associate_public_ip_address = "1"
root_block_device {
volume_size = "10"
volume_type = "standard"
}
ebs_block_device {
device_name = "/dev/sdb"
volume_size = "50"
volume_type = "standard"
delete_on_termination = "true"
}
tags = {
Name = "MySpotInstance"
Application = "MyApp"
Environment = "TEST"
}
}

Decrypting Windows Password in terraform

I'm trying to set up a Terraform script to deploy a windows server. When running terraform apply I get an error message referencing below
Error: Invalid reference
on main.tf line 44, in resource "aws_instance" "server":
44: password = "${rsadecrypt(aws_instance.server[0].password_data, file(KEY_PATH))}"
A reference to a resource type must be followed by at least one attribute
access, specifying the resource name.
AFAIK the resource is "aws_instance", the name is "server[0]" while the attribute is the "password_data". I know I'm missing something but don't know what. any assistance would be appreciated.
The full resource module is below in case there is something I'm missing contained in there.
Thanks
resource "aws_instance" "server" {
ami = var.AMIS[var.AWS_REGION]
instance_type = var.AWS_INSTANCE
vpc_security_group_ids = [module.networking.security_group_id_out]
subnet_id = module.networking.subnet_id_out
## Use this count key to determine how many servers you want to create.
count = 1
key_name = var.KEY_NAME
tags = {
# Name = "Server-Cloud"
Name = "Server-${count.index}"
}
root_block_device {
volume_size = var.VOLUME_SIZE
volume_type = var.VOLUME_TYPE
delete_on_termination = true
}
get_password_data = true
provisioner "remote-exec" {
connection {
host = coalesce(self.public_ip, self.private_ip)
type = "winrm"
## Need to provide your own .pem key that can be created in AWS or on your machine for each provisioned EC2.
password = ${rsadecrypt(aws_instance.server[0].password_data, file(KEY_PATH))}
}
inline = [
"powershell -ExecutionPolicy Unrestricted C:\\Users\\Administrator\\Desktop\\installserver.ps1 -Schedule",
]
}
provisioner "local-exec" {
command = "echo ${self.public_ip} >> ../public_ips.txt"
}
}
Use password = "${rsadecrypt(self.password_data, file("/root/.ssh/id_rsa"))}"
without user = "admin" as below :
resource "aws_instance" "windows_server" {
get_password_data = "true"
connection {
host = "${self.public_ip}"
type = "winrm"
https = false
password = "${rsadecrypt(self.password_data, file("/root/.ssh/id_rsa"))}"
agent = false
insecure = "true"
}
}

Terraform and AWS spots instances

As described in https://github.com/hashicorp/terraform/issues/17429
After 7 days the spot request is getting cancelled and the instance is still running. So when I run the "terraform apply" its trying to create a new spot. This happens with AWS provider >=1.13.0.
I'm using AWS provider 1.32.0, does anyone know a workaround to this issue? on future installation I will use the valid_until flag which will extend the request lifetime, but what about already installed spots?
Thanks
resource "aws_spot_instance_request" "cheap_worker" {
count = "${var.kube_master_spot_num}"
ami = "${data.aws_ami.nat_ami.id}"
availability_zone ="${element(slice(data.aws_availability_zones.available.names,var.kube_master_on_demand_num,var.availability_zones_num),count.index)}"
spot_price = "3"
instance_type = "${var.kube_master_type}"
subnet_id = "${element(module.aws-vpc.aws_subnet_ids,count.index + var.kube_master_on_demand_num)}" # adjusting to a case with spots & on-demand servers
vpc_security_group_ids = [ "${module.aws-vpc.cluster_sg_id}", "${module.aws-vpc.route53_sg_id}" ]
key_name = "${basename(var.local_ssh_key)}"
associate_public_ip_address = true
root_block_device = [{volume_type="gp2",volume_size="50",delete_on_termination=true}]
spot_type = "one-time"
wait_for_fulfillment = true
tags {
Name = "${var.kube_identify}-${var.kube_type}-master-${count.index}"
}
provisioner "local-exec" {
command = "sleep 120"
}
connection {
type = "ssh"
user = "ubuntu"
private_key = "${file(var.local_ssh_key)}"
}
provisioner "remote-exec" {
inline = [
"sudo apt-get update"
]
}
}

Resources