Lifecycle of an EC2 Container Service Instance - amazon-ec2

In my project I have a constraint where all of the traffic received will go to a certain IP. The Elastic IP feature works well for this.
My question is, considering we are using Amazon's docker service (ECS) without autoscaling (so instances/tasks will be scaled manually), can we treat the instances created by the ECS service as we would treat normal, on-demand instances? As in they won't be terminated/stopped unless explicitly done by a user (or API call or whatever).

As is mentioned in the Scaling a Cluster documentation, if you created your cluster after November 24th, 2015 using either the first run wizard, or the Create Cluster wizard, then an Autoscaling group would have been created to manage the instances backing your cluster.
For the most part, the answer to your question is Yes. The instances wouldn't normally go about getting replaced. It is also important to note that because this is backed by an auto scaling group, AutoScaling might go about Replacing unhealthy instances for you. If an instance fails it EC2 Health Checks for some reason, it will be marked as unhealthy, and scheduled for replacement.
By default, my understanding is there are no CloudWatch Alarms or Scaling Policies effecting this AutoScaling group, so it should just be when an instance becomes healthy that it would get replaced.

Related

How to create a bash script for autoscaling EC2 instances given the work volume of a SQS?

I created a bash script with aws-cli that sends 1000 messages using SQS, now I want to create another one that runs in parallel and creates and destroys EC2 instances given this condition:
Checks every 15 seconds: if (((ApproximateNumberOfMessages + 9)/10) - N running instances) > 0 creates an instance, else destroys an instance.
My first problem is that I don't know how to connect my SQS queue to a EC2 instance so it can process these messages. I tried following this tutorial: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-sending-messages-from-vpc.html, but I don't want to use a private VPC and security groups so I was wondering if there is a way to make it easier.
My questions are: Is it possible to do it just using a bash script instead of CloudWatch and Autoscaling Groups? How do I create a EC2 instance that is ready to process these messages?
When you create an EC2 instance, it automatically gets an Elastic Network Interface (ENI, a virtual network card) for which AWS automatically assigns either a default security group, either another user created security group. You can not detach the default ENI, also you cannot have an ENI without a security group. Moreover, EC2 instances have to run inside a VPC, which can be private of public. Nevertheless, if you work with EC2 instances, you have to deal with security groups as well.
Is it possible to do it just using a bash script instead of CloudWatch and Autoscaling Groups?
It might be possible, but you will find yourself reinventing the wheel. Autoscaling does more than just adding/removing instances based on some condition. For example, it also makes sure that your instances are replaced if they become unhealthy or if they are terminated for some reason. For more info see AWS ASG FAQ.
How do I create a EC2 instance that is ready to process these messages?
You can't just start an instance and expect to process your messages. You need to have some code or some kind software deployed to it and configured to poll messages from your queues.

Monitoring EBS volumes for istances with CloudWatch Agent and CDK

I'm trying to set up a way to monitor disk usage for instances belonging to an AutoScaling Group, and add an alarm when the volumes associated to the instances are almost full.
Since it seems there are no metrics normally offered by Amazon to do that, I resorted using the CloudWatch Agent to get what I wanted. So far so good, I can create graphs and alarms for the metrics I want using the CloudWatch console.
My issue is how to automate everything with CDK. How can I automate the creation of the metric for each instance, without knowing the instance id beforehand? Is there a solution for this issue?
You can install and config CloudWatch agent via EC2 user data and the auto scaling group uses launch template to launch EC2 instance. All of those things can be done by AWS CDK.
There is an example from this open source project for your reference.
Another approach you could take is using AWS Systems Manager. Essentially, you install an SSM agent for your instances, and create an SSM Document (think Shell/Python script) that will run your setup script/automation.
You then create a State Manager Association, tying the SSM Document with your instances based on EC2 tags e.g. Application=MyApp or Team=MyTeam. This way, you don't have to provide any resource ids, just the tag key value pair which could extend multiple instances and future instance replacements. You can schedule it to run at specific times (cron) or at a certain frequency (rate) to enforce state.

How to replace ECS cluster instances without downtime or reduced redundancy?

I currently have a try-out environment with ~16 services divided over 4 micro-instances. Instances are managed by an autoscaling group (ASG). When I need to update the AMI of my cluster instances, currently I do:
Create new launch config, edit ASG with new launch config.
Detach all instances with replacement option from the ASG and wait until the new ones are listed in the cluster instance list.
MANUALLY find and deregister the old instances from the ECS cluster (very tricky)
Now the services are killed by ECS due to deregistering the instances :(
Wait 3 minutes until the services are restarted on the new instances
MANUALLY find the EC2 instances in the EC2 instance list and terminate them (be very very careful not to terminate the new ones).
With this approach I have about 3 minutes of downtime and I shiver from the idea to do this in production envs.. Is there a way to do this without downtime but keeping the overall amount of instance the same (so without 200% scaling settings etc.).
You can update the Launch Configuration with the new AMI and then assign it to the ASG. Make sure to include the following in the user-data section:
echo ECS_CLUSTER=your_cluster_name >> /etc/ecs/ecs.config
Then terminate one instance at a time, and wait until the new one is up and automatically registered before terminating the next.
This could be scriptable to be automated too.

Change Instance type of a cluster registered ec2 instance

I have an Amazon EC2 instance which is registered to a cluster of Amazon ECS.
And I want to change this instance's type from c4.large to c4.8xlarge.
I'm able to change its type from c4.large to c4.8xlarge in AWS console. But after the change, I found
[ERROR] Could not register module="api client" err="ClientException: Container instance type changes are not supported. Container instance XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX was previously registered as c4.large.
being printed in /var/log/ecs/ecs-agent.log.20XX-XX-XX-XX file.
Is it possible to change ec2 instance type and re-register it to a cluster?
I think maybe deregister it first, then register it again should work. But I'm afraid this may cause something irreversible in my AWS working environment. So I haven't tried this method yet.
To solve this connection problem between the agent and cluster, just delete the file /var/lib/ecs/data/ecs_agent_data.json and restart docker and ECS.
After that, a new container instance will be created in your cluster with the new size.
sudo rm /var/lib/ecs/data/ecs_agent_data.json
sudo service docker restart
sudo start ecs
Then you can go to the ECS cluster console and deregister the old container instance
UPDATE:
According to #florins and #MBear commented below, AWS updated the data file on ECS instances.
sudo rm /var/lib/ecs/data/agent.db
sudo service docker restart
sudo start ecs
As of March 2021 / AMI image ami-0db98e57137013b2d, /var/lib/ecs/data/ecs_agent_data.json mentioned in the last useful answer does not exist. For me, the commands to execute on the changed instance were:
sudo rm /var/lib/ecs/data/agent.db
sudo service docker restart
After that, it was possible to deploy containers to the instance, without fresh registration (AWS automatically registered a second ECS container instance of the new type). I did have a leftover container instance with the resources of the old instance type to remove.
You can't do this. Per their docs:
The type of EC2 instance that you choose for your container instances determines the resources available in your cluster. Amazon EC2 provides different instance types, each with different CPU, memory, storage, and networking capacity that you can use to run your tasks. For more information, see Amazon EC2 Instances.
This means that when you launch a container on an instance, the agent gathers a bunch of metadata about the instance to run it. If you change it, all of that metadata (or a lot) has changed in a bad way. CPU units, memory, etc. The agent is aware of this and will report it as an error.
You should spin up a new instance of the new type and register it to the cluster and let the task run on it. If it's a service, just terminate the old instance and let it run it against the new one.
I can't think of any real reason why terminating your old instance would cause something irreversible unless it is misconfigured or fragile via user specific settings, by default this would not cause anything destructive.
As alternative approach if the EC2 instance does not store any valuable a new instance using the old instance as template could be started. This takes all existing values and can be achieved just with a few clicks in minutes.
Select the EC2 instance and then "Actions -> Images and templates -> Start more like this". Just change the instance type.
When the instance is running got the the ECS cluster to the tab "ECS instances" and activate the new created instance.
Shutdown the old instance
Update your task maybe taking more cpu and memory and update the service to take the new task revision

How can I connect my autoscaling group to my ecs cluster?

In all tutorials for ECS you need to create a cluster and after that an autoscaling group, that will spawn instances. Somehow in all these tutorials the instances magically show up in the cluster, but noone gives a hint what's connecting the autoscaling group and the cluster.
my autoscaling group spawns instances as expected, but they just dont show up on my ecs cluster, who holds my docker definitions.
Where is the connection I'm missing?
I was struggling with this for a while. The key to getting the instances in the autoscaling group associated with your ECS cluster is in the user data. When you are creating your launch config when you get to step 3 "Configure Details" hit the advanced tab and enter a simple bash script like the following for your user data.
#!/usr/bin/env bash
echo ECS_CLUSTER=your_cluster_name >> /etc/ecs/ecs.config
All the available parameters for agent configuration can be found here http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html
An autoscaling group is not strictly associated to a cluster. However, an autoscaling group can be configured such that each instance launched registers itself into a particular cluster.
Registering an instance into a cluster is the responsibility of the ECS Agent running on the instance. If you're using the Amazon ECS-optimized AMI, the ECS Agent will launch when the instance boots and register itself into the configured cluster. However, you can also use the ECS Agent on other Linux AMIs by following the installation instructions.
Well, i found out.
Its all about the ecs-agent and its config file /etc/ecs/ecs.config
(This file will be created through the Userdata field, when creating EC2 instances, even from an autoscaling configuration.)
Read about its configuration options here: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html
But you can even copy a ecs.config stored on Amazon S3, do it like this (following lines go into Userdata field):
#!/bin/bash
yum install -y aws-cli
aws configure set default.s3.signature_version s3v4
aws configure set default.s3.addressing_style path
aws configure set default.region eu-central-1
aws s3 cp s3://<bucketname>/ecs.config /etc/ecs/ecs.config
note: Signature_version v4 is specific for some regions, like eu-central-1.
This ofc only works, if your IAM role for the instance (in my case its ecsInstanceRole) has the right AmazonS3ReadOnlyAccess
The AWS GUI console way for that would be:
Use the cluster wizard at https://console.aws.amazon.com/ecs/home#/firstRun .
It will create an autoscaling grou for your cluster, a loadbalancer in front of it, and connect it all nicely.
This question is old but the answer is not complete. There are 2 parts to getting your own auto-scaling group to show up in your cluster (as of Jan 2022).
You need to ensure your cluster name is set for ECS_CLUSTER variable in /etc/ecs/ecs.config as mentioned in this answer: https://stackoverflow.com/a/35324937/583875
You need to create a new capacity provider for the cluster and attach this auto scaling group. To do this, go to Cluster -> Capacity Provider -> Create -> Select your auto scaling group under Auto Scaling group.
Another tricky part is getting your service to use the instances (if you have a service running). You need to edit the Service, and change the Capacity provider strategy. Click on Add another provider and choose the new capacity provider you created in (2) above.
That's all! To ensure things are working properly: you should see your capacity provider under Graph -> Capacity Providers and you should see instances from your auto scaling group under Graph -> ECS Instances.

Resources