Does ECS update-service command marks the container instance state to draining when use with the --force-new-deployment option? - amazon-ec2

The command:
aws ecs update-service --service my-http-service --task-definition amazon-ecs-sample --force-new-deployment
As per AWS docs: You can use this option (--force-new-deployment) to trigger a new deployment with no service definition changes. For example, you can update a service's tasks to use a newer Docker image with the same image/tag combination (my_image:latest ) or to roll Fargate tasks onto a newer platform version.
My question, if I use '--force-new-deployment' (as I will use the exisiting tag or definition), will the underline 'ECS Instance' automatically set to DRAINING state, so that any new task (if any) will not start in the EXISTING ecs-instance that is suppose to go away during rolling-update deployment strategy (or deployment controller) ?
In other words, will there be any chance:
For a new task to be created on the existing/old container instance, that is suppose to go away during rolling update.
Also, what would happen with the ongoing task that is running on this existing/old container instance, that is suppose to go away during rolling update.
Ref: https://docs.aws.amazon.com/cli/latest/reference/ecs/update-service.html

Please note that no Container instance is going anywhere with this 'update-service' command. This command will only create a new Deployment under the ECS service and when the new tasks become healthy, remove the old task(s).
Edit 1:
How about the request that were served by old task?
I am assuming the tasks are behind an Application Load Balancer. In this case, old tasks will be deregistered from the ALB.
Note: In the following discussion, target is the ECS Task.
To give you a brief description on how the Deregistration Delay works with ECS, the following is the sequential order when a task connected to an ALB is stopped. It can be due to a scale in event, deployment of a new task definition, decrease of the number of tasks, force deployment, etc.
ECS sends DeregisterTargets call and the targets change the status to "draining". New connections will not be served to these targets.
If the deregistration delay time elapsed and there are still in-flight requests, the ALB will terminate them and clients will receive 5XX responses originated from the ALB.
The targets are deregistered from the target group.
ECS will send the stop call to the tasks and the ECS-agent will gracefully stop the containers (SIGTERM).
If the containers are not stopped within the stop timeout period (ECS_CONTAINER_STOP_TIMEOUT by default 30s) they will force stopped (SIGKILL).
As per the ELB documentation [1] if a deregistering target has no in-flight requests and no active connections, Elastic Load Balancing immediately completes the deregistration process, without waiting for the deregistration delay to elapse. However, even though target deregistration is complete, the status of the target will be displayed as draining and you can see the registered Target of the old task is still present in the TargetGroup console with status as draining until the deregistration delay elapses.
The design of ECS is to stop the Task after the completion of Draining process as mentioned in the ECS document [2] and hence the ECS Service waits for the TargetGroup to complete the Draining process before issuing the stop call.
Ref:
[1] Target Groups for Your Application Load Balancers - Deregistration Delay - https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#deregistration-delay
[2] Updating a Service - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html

Related

if an aws spot instance is stopped by AWS and then restarts will it just start where it left off?

I am running luigi, a pipeline manager which is processing 1000 tasks. Currently I poll for the AWS termination notice. If it is present then I requeue the job; wait 30 minutes; then launch a new server starting all the tasks from scratch. However sometimes it restarts the same job multiple times which is inefficient.
Instead I am considering using create_fleet with InstanceInterruptionBehaviour=Stop? If I do this then when it restarts will it still be running the luigi daemon and retain the state of all the tasks?
All InstanceInterruptionBehaviour=Stop does is effectively shutdown your EC2 instance rather than terminate it. Since the "persistent" setting is required in addition to EBS storage" you will keep all the data currently on the attached EBS volumes at the time of the instance stop.
It is completely dependent on the application itself (Luigi in this case) to be able to store the state of its execution and pick back up from where it left off. For one, you'll want to ensure you enable the service daemon to automatically start upon system start (example):
sudo systemctl enable yourservice

Stopping ec2 to scale-down before it completes the process running on it

We have an application which runs on ec2 instances that we use as docker host in ecs cluster. There are multiple tasks running on each ec2. Each task picks up one message from SQS and process some event(which convert data from one format to other and upload it to a file system), which may take from few seconds to 12-15hours depending on the size of the data it contains. Once an event processing is completed task is stopped and for new message(event) new task is created. Whenever there are huge number of messages in SQS we are scaling-up the instances to process the messages (to avoid wait time). When (number of messages) < (number of running tasks) for certain duration then we need to scale-down i.e. we need to terminate ec2 instances.
For ec2 scale-down we need to make sure there is no task running i.e. container is not processing any event on it. There is no way to found out which ec2 are free(not processing any event) so we marked container to DRAINING state and then TERMINATING ec2. But while we mark any container to draining state, task running on it, are stopped(hence event processing is killed in between and data is lost). Is there any why we can complete the process before tasks are stopped or anyone can suggest better approach.

Mesos task history after restart

I am using Mesos for container orchestration and get task history from Mesos using /task endpoint.
Mesos is running in a 7 nodes cluster and zookeeper is running in a 3 node cluster. I hope, Mesos uses Zookeeper to store the task History. We lost history sometimes when we restart Mesos. Does it store in memory? I am trying to understand what is happening here.
My questions are,
Where does it store task histories?
How can we configure the task history cleanup policy?
Why do we lose complete task history on restarting Mesos?
To answer your questions:
Task history/state for Mesos is stored in memory, and in the replicated_log (details here). The default is set to use the replicated_log, to store state completely in memory without the replicated_log you would have to specify this in your Mesos flags seen here in the configuration page as --registry=in_memory
Most users typically configure task history cleanup by using these three flags (there are more, but these are most common) --max_completed_frameworks=VALUE, --max_completed_tasks_per_framework=VALUE, and --max_unreachable_tasks_per_framework=VALUE as described in the previous document.
Yes, task history for the /tasks endpoint is lost every time a Mesos Master is restarted. However, the /state endpoint will still contain all task status changes over time.
**Edited to reflect information about the /tasks endpoint, not the /state endpoint.

How to replace ECS cluster instances without downtime or reduced redundancy?

I currently have a try-out environment with ~16 services divided over 4 micro-instances. Instances are managed by an autoscaling group (ASG). When I need to update the AMI of my cluster instances, currently I do:
Create new launch config, edit ASG with new launch config.
Detach all instances with replacement option from the ASG and wait until the new ones are listed in the cluster instance list.
MANUALLY find and deregister the old instances from the ECS cluster (very tricky)
Now the services are killed by ECS due to deregistering the instances :(
Wait 3 minutes until the services are restarted on the new instances
MANUALLY find the EC2 instances in the EC2 instance list and terminate them (be very very careful not to terminate the new ones).
With this approach I have about 3 minutes of downtime and I shiver from the idea to do this in production envs.. Is there a way to do this without downtime but keeping the overall amount of instance the same (so without 200% scaling settings etc.).
You can update the Launch Configuration with the new AMI and then assign it to the ASG. Make sure to include the following in the user-data section:
echo ECS_CLUSTER=your_cluster_name >> /etc/ecs/ecs.config
Then terminate one instance at a time, and wait until the new one is up and automatically registered before terminating the next.
This could be scriptable to be automated too.

Error in autoscaling in ec2?

I am using the autoscaling feature.
I set up the entire thing but the instance were automatically launched and terminated even if the instance does not reach the threshold.
I followed the steps:
Created an instance
Created a load balancer and registered an instance
Created a launch configuration
Created a cloud watch that cpu >=50 %\
An autoscale policy that launches and terminates the instance when CPU >=50 %
But as soon as I apply the policy the instances begin to launch and terminated without any CPU load and it continues
Cause: At 2014-01-14T10:51:08Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.
StartTime: 2014-01-14T10:51:08.791Z
Cause: At 2014-01-14T10:02:16Z an instance was taken out of service in response to a system health-check.
UPDATE
Documentation:
Follow the instructions on how to Set Up an Auto-Scaled and Load-Balanced Application
Notes:
The instance, created outside of AutoScaling Group can be added to Elastic Load Balancer, but will not be monitored or managed by AutoScaling group.
Instance, created outside of AutoScaling Group can be marked as unhealthy by Elastic Load Balancer if the health check fails, but it will not cause AutoScaling Group to spawn a new instance.

Resources