Terraform - having timing issues launching EC2 instance with instance profile - amazon-ec2

I'm using Terraform to create my AWS infrastructure.
I've a module that creates an "aws_iam_role", an "aws_iam_role_policy", and an "aws_iam_instance_profile" and then launches an EC2 Instance with that aws_iam_instance_profile.
"terraform plan" works as expected, but with "terraform apply" I consistently get this error:
* aws_instance.this: Error launching source instance: InvalidParameterValue: IAM Instance Profile "arn:aws:iam::<deleted>:instance-profile/<deleted>" has no associated IAM Roles
If I immediately rerun "terraform apply", it launches the EC2 instance with no problem. If I run a "terraform graph", it does show that the instance is dependent on the profile.
Since the second "apply" is successful, that implies that the instance_policy and all that it entails is getting created correctly, doesn't it?
I've tried adding a "depends_on" and it doesn't help, but since the graph already shows the dependency, I'm not sure that is the way to go anyway.
Anyone have this issue?

Race conditions are quite common between services - where state is only eventually consistent due to scale. This is particularly true with IAM where you will often create a role and give a service such as EC2 a trust relationship to use the role for an EC2 instance, but due to however IAM is propogated throughout AWS, the role will not be available to EC2 services for a few seconds after creation.
The solution I have used, which is not a great one but gets the job done, is to put the following provisioner on every single IAM role or policy attachment to give the change time to propagate:
resource "aws_iam_role" "some_role" {
...
provisioner "local-exec" {
command = "sleep 10"
}

In this case you may use operation timeouts. Timeouts are handled entirely by the resource type implementation in the provider, but resource types offering these features follow the convention of defining a child block called timeouts that has a nested argument named after each operation that has a configurable timeout value. Each of these arguments takes a string representation of duration, such as "60m" for 60 minutes, "10s" for ten seconds, or "2h" for two hours.
resource "aws_db_instance" "example" {
# ...
timeouts {
create = "60m"
delete = "2h"
}
}
Ref: https://www.terraform.io/docs/configuration/resources.html

Related

Terraform : How to fetch or destroy resources created by other means?

Sometimes I end up creating resources using AWS console due to some errors in Terraform or for lack of time. Can I list all my resources and destroy them? Basically a discovery of existing cloud resources and management of such ?
Ex: list my EC2 instances using Terraform and destroy when needed . How to achieve this?
Terraform is designed to ignore any existing objects that it didn't create because otherwise it would be risky to adopt Terraform an existing system with many existing objects and it would be impossible to decompose the infrastructure into different configurations for each subsystem without each one trying to destroy the objects being managed by the others.
Terraform doesn't have any facility for automatically detecting objects created outside of Terraform, but you can explicitly bind specific objects from your remote system to resource instances in your Terraform configuration using the terraform import command.
That command has some safeguards to try to prevent accidentally immediately deleting an object you've just imported if e.g. you make a typo of the resource instance address, and so unfortunately the design of this command is contrary to your goal: it won't let you just import something and run terraform apply to destroy it.
Instead, you'd need to:
Write a stub empty resource block for a resource of the appropriate type in your configuration.
Run terraform import to bind your existing real object to that empty resource block.
After the import succeeds, immediately remove the resource block to tell Terraform that you intend to delete the object.
Run terraform apply, and then Terraform should notice that it's tracking an object that is no longer mentioned in the configuration and propose to delete it.
Terraform is not the best tool for this job because it has essentially been designed to do the exact opposite of what you want to do, because typically users want to avoid destroying untracked objects to avoid disrupting neighboring systems.
However, you may be able to get the effect you want with some custom programming on your part, by writing a program that does something like the following:
Run terraform show -json in all of your configuration working directories to obtain a machine-readable description of the Terraform state in each one.
Decode the JSON state descriptions to find all of the resource instances of type aws_instance and collect a set of all of their id attribute values. This is the set of instances to keep.
Call the EC2 API DescribeInstances action to retrieve a list of all of the instances that actually exist. Collect a set of all of their IDs. This is the set of instances that exist.
Set-subtract the set of instances to keep from the set of instances that exist. The result is the set of instances to destroy.
If the set of instances to destroy isn't empty, call the EC2 API's TerminateInstances action to terminate every instance ID in that set.
This description is specific to Amazon EC2 instances. The same pattern could apply to objects of any other type, but there is no general solution that will work across all object types at once because the AWS API design doesn't work that way: each object type has its own separate operations for querying which objects exist and for destroying a particular object or set of objects.

TerraForm: deploying EC2 instances without starting them

I want to deploy my infrastructure in different AWS environments (dev, prod, qa).
That deployment creates a few EC2 instances from a custom AMI. When deployed, instances are in the "running" state. I understand this seems to be related to some constraint in the EC2 API. However, I don't necessarily want my instances started, depending on context. Sometimes, I just want the instances to be created, and they will be started later on. I guess this is a quite common scenario.
Reading the few related issues/requests on Hashicorp's github, makes me think so:
Terraform aws instance state change
Stop instances
aws_instance should allow to specify the instance state
There must be some TerraForm based solution which doesn't require to rely on AWS CLI / CDK or lambda, right? Something in the TerraForm script that, for example, would stop the instance right after its creation.
My google foo didn't help me much here. Any help / suggestion for dealing with that scenario is welcome.
Provisioning a new instance automatically puts it in a 'started' state.
As Marcin suggested, you can use user data scripts, here's some psuedo user data script. For you to figure out the actual implementation ;)
#!/bin/bash
get instance id, pass it to the subsequent line
aws ec2 stop-instances --instance-ids i-1234567890abcdef0
You can read about running scripts as part of the bootstrapping here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
Basically its all up to your use case. We don't do this generally. Still, if you want to provision your EC2 instances and need to put them in stopped state, as bschaatsbergen suggested, you can use the user_data in Terraform. Make sure to attach the role with relevant permission.
#!/bin/bash
INSTANCE_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id/`
REGION=`curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/.$//'`
aws ec2 stop-instances --instance-ids $INSTANCE_ID --region $REGION
As already stated by others, you cannot just "create" instances, they will be in "started" state.
Rather I would ask what is the exact use case here :
Sometimes, I just want the instances to be created, and they will be started later on.
Why you have to create instances now and use them later? Can't they be created exactly when they are required? Any specific requirement to keep them initialized before they are used? Or the instances take time to start?

AWS SSM describe_instance_information using old data?

I have a lambda function that is running:
ssm = boto3.client('ssm')
print(ssm.describe_instance_information())
It returns 6 instances. 5 are old instances that have been terminated and no longer show up in my console anymore. One instance is correct. I created an AMI image of that instance and tried launching several instances under the same security group and subnet. None of those instances are returned from describe_instance_information. Is it reporting old data?9
My end goal is to have the lambda function launch an instance using the AMI and send a command to it. Everything works if I use the existing instance. I am trying to get it to work with one created from the AMI.
EDIT:
After a while, the instances did show up, I guess it takes a while. I dont understand why terminated instances still show up. I can poll describe_instance_information until the instance_id I want shows up but is there a cleaner built-in function, like wait_for_xxxxx()?
You can use filter parameter with PingStatus which determine Connection status of SSM Agent.
response = client.describe_instance_information(
Filters=[
{
'Key': 'PingStatus',
'Values': [
'Online'
]
},
]
)

Getting Around Terraform's Limitations

I'm trying to setup terraform to handle creation of fine-grained user permissions, and have been able to create:
Cognito User Pools, Identity Pools
IAM Roles, Permissions
What I'm struggling with is how to link them together. I have two types of user:
Standard User
Manager
As such, I have found two ways that I could use to correctly hook up the correct IAM policy upon login:
Method 1 - Create a custom attribute, and Use the "Choose Role With Rules" to set a rule to set an IAM policy based on the attribute
Method 2 - Create Cognito Groups, and link users and the required IAM policy to each group.
The problem, as far as I can see, is that Terraform doesn't currently support either of those cases, so I need to find a work around. So, my question is essentially, how do I get around terraform's lack of support in some areas?
I've seen some projects that use [Ruby, Go, etc.] to make up for some of the limitations, but I don't quite understand where to start and what is the best option for my needs. I haven't been able to find much in Google yet (possibly https://github.com/infrablocks/ruby_terraform). Does anyone have a good guide or resource I could use to get started?
If terraform does not support something you can use the local-exec provisioner to execute commands after resource creation. For example you could use the aws cli to add a custom attribute:
resource "aws_cognito_identity_pool" "main" {
# ...
provisioner "local-exec" {
command = "aws cognito-idp add-custom-attributes --user-pool-id ${aws_cognito_identity_pool.main.id} --custom-attributes <your attributes>"
}
}
local-exec docs

EC2: Waiting until a new instance is in running state

I would like to create a new instance based on my stored AMI.
I achieve this by the following code:
RunInstancesRequest rir = new RunInstancesRequest(imageId,1, 1);
// Code for configuring the settings of the new instance
...
RunInstancesResult runResult = ec2.runInstances(rir);
However, I cannot find a wait to "block"/wait until the instance is up and running apart from Thread.currentThread().sleep(xxxx) command.
On the other hand, StartInstancesResult and TerminateInstancesResult gives you a way to have access on the state of the instances and be able to monitor any changes. But, what about the state of a completely new instance?
boto3 has:
instance.wait_until_running()
From the boto3 docs:
Waits until this Instance is running. This method calls EC2.Waiter.instance_running.wait() which polls EC2.Client.describe_instances() every 15 seconds until a successful state is reached. An error is returned after 40 failed checks.
From the AWS CLI changelog for v1.6.0:
Add a wait subcommand that allows for a command to block until an AWS
resource reaches a given state (issue 992, issue 985)
I don't see this mentioned in the documentation, but the following worked for me:
aws ec2 start-instances --instance-ids "i-XXXXXXXX"
aws ec2 wait instance-running --instance-ids "i-XXXXXXXX"
The wait instance-running line did not finish until the EC2 instance was running.
I don't use Python/boto/botocore but assume it has something similar. Check out waiter.py on Github.
Waiting for the EC2 instance to get ready is a common pattern. In the Python library boto you also solve this with sleep calls:
reservation = conn.run_instances([Instance configuration here])
instance = reservation.instances[0]
while instance.state != 'running':
print '...instance is %s' % instance.state
time.sleep(10)
instance.update()
With this mechanism you will be able to poll when your new instance will come up.
Depending on what you are trying to do (and how many servers you plan on starting), instead of polling for the instance start events, you could install on the AMI a simple program/script that runs once when the instance starts and sends out a notification to that effect, i.e. to an AWS SNS Topic.
The process that needs to know about new servers starting could then subscribe to this SNS topic, and would receive a push notifications each time a server starts.
Solves the same problem from a different angle; your mileage may vary.
Go use Boto3's wait_until_running method:
http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.Instance.wait_until_running
You can use boto3 waiters,
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#waiters
for this ex: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Waiter.InstanceRunning
Or in Java https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/
I am sure there are waiters implemented in all the AWS sdks.

Resources