How to use ELB and AutoScaling termination for long living connections - amazon-ec2

I want to set up Autoscaling groups where we can launch and terminate instances based on the CPU load. But usually our connections stays for long like more than 8hrs sometimes even more than that. When I use NLB, the Deregistration delay is only supported till 3600sec and after that NLB will forcefully remove the connection which cause our long living connections to fail and autoscaling will terminate the instances as well.
How do I make sure that all my connections to the target group is processed after 8-10hrs and then NLB deregister or autoscaling terminate the instance?
I checked the ASG Lifecycle hooks and it allows connections only till 2hrs.
Is it possible to deregister the instances in target group after all the connections are drained and terminate the instance using ASG?

There isn't any good/easy way to do what you want to do. What are the instances doing that can last up to 10 hours?
Depending on your work type, this is the best workaround I can think of, but it would probably involve a bit of rearchitecting.
1) design your application so that all data is stored off the instance is some sort of data tier (S3, RDS, EFS, etc). When an instance is done doing whatever its doing, save that info to the data tier. This way a user request can go to any instance and get the same information
2) The ASG decides to scale in
3) You have a lifecycle hook configured and a cloudwatch notification setup to be triggered when an instance enters the terminating:wait state which notifies the instance
4) The instance periodically sends a heartbeat to the lifecycle hook which can extend the hooks timeout for up to 2 days
5) Whenever the instance finishes what its doing, it saves the information out to the data tier mentioned in 1) and the client can connect to a new instance to get the information that was being processed on the old one
https://docs.aws.amazon.com/cli/latest/reference/autoscaling/record-lifecycle-action-heartbeat.html
https://docs.aws.amazon.com/cli/latest/reference/autoscaling/complete-lifecycle-action.html

Try to use, Scaling CoolDown period. By the default Scaling Cooldown Period is (300 Secs). you can increase the number. which will help to increase the scale in time.

Related

Initiate EC2 Instance Shutdown via EstimatedCharges Threshold

I am using amazon-ec2 and currently have a couple of CloudWatch Alarms set using the EstimatedCharges Threshold at different price ranges.
While my request here would obviously be a last/worst case situation, I am wondering how it would be possible to do this, if it even is.
What I am wanting to do is to setup an alarm that will (somehow, how??) initiate a Shutdown of a specific EC2 Instance when a specific alarm goes from a state of OK to a state of ALARM ?
I do not want to TERMINATE the instance, just a SHUTDOWN.
The idea here being making sure that a monthly bill does not all of a sudden go way beyond what can be afforded, even if it does mean shutting down the entire server.
Maybe there is another/better method of doing what I am after via another AWS service, if so, would love to know about that.

Ensure the availability of a pool of stopped ec2 instances

I want to maintain a pool of stopped amazon ec2 instances. Whenever the amount is below the threshold, I would like to be able to create new instances and then immediately stop them once they are running. Is this possible within the amazon infrastructure alone?
You can certainly create Amazon EC2 instances and then Stop them, making the available to Start later. As you point out, this has the benefit that instances will Start faster than they take to Launch a new instance.
There is no automated method to assist with this. You could have to code a solution that does the following:
Monitor the number of Stopped instances
If the quantity is below the threshold, launch a new instance
The new instance could automatically stop itself via User Data (either via a Shutdown command to the Operating System, or via a StopInstances call to EC2)
Some things you would have to consider:
What triggers the monitoring? Would it be on a schedule?
The task that launches a new instance would need to wait for the new instance to Launch & Stop before launching any more instances
What Starts the instances when they are needed?
Do instances ever get Stopped when they are no longer required?
The much better choice would be to use Auto Scaling, with a scale-out alarm based on some metric that says your fleet is busy, and a scale-in alarm to remove instances when the fleet is not busy. The scale-out alarm could be set to launch instances once a threshold is passed (eg 80% CPU) that should allow the new instance(s) to launch before things are 100% busy. The time difference between launching a new instance and starting an existing instance is quite small (at least for Linux).
If you're using Windows, the biggest time delay when launching a new instance is due to Sysprep, which makes a "clean" machine with new Unique IDs. You could cheat by creating an AMI without Sysprep, which would boot faster.
Perhaps I am misunderstanding your objective... you can't "ensure availability" of instances without paying for them.
Instances in the stopped state are only logical entities that don't physically exist anywhere -- hardware is allocated on launch, deallocated on stop, reallocated on the next start. In the unlikely condition where an availability zone is exhausted of capacity for given instance class, stopped instances of that class won't start, because there is no hardware available for them to be deployed onto.
To ensure that instances are always available, you have to reserve them, and you have to specify the reservations in a specific availability zone:
Amazon EC2 Reserved Instances provide a significant discount (up to 75%) compared to On-Demand pricing and provide a capacity reservation when used in a specific Availability Zone. [emphasis added]
https://aws.amazon.com/ec2/pricing/reserved-instances/
Under most plans, reserved instances are billed the same rate whether they are running or not, so there would be little point in stopping them.

ElastiCache Maintenance Window Availability

We are planning to use ElastiCache (Redis) instead of our own redis cluster. However, the "maintenance window" setting creates some questions,
If I use a multi-az replicated cluster, will elasticache failover to available replicas during maintenance windows or does the entire cluster go down during maintenance?
How long does it generally take?
We can also use MemCached instead of Redis, does it have better availability situation during maintenance windows?
How do others handle ElastiCache manintenance windows? Just go woth the downtime?
Thanks!
There are usually 2 maintenance AWS does.
Continuous managed maintenance updates.
Service updates
While creating cluster you need to specify a 60 min maintenance window. Usually all the maintenance updates (1) will happen during that time.
For every service updates you will recieve notifications when there is a scheduled one. Notification will be in the form of email or a notification on the elasticache page etc... Based on the notification you can reschedule the service updates to a comfortable time. If you fail to reschedule it will by default pick you maintenance window and apply the service updates.
Basically during the maintenance updates, AWS will replace your node with a new node with required updates. If you have primary/replica set up with multi az and auto failover set to true, then during maintenance window of the primary node, you replica will be promoted to master and your read/write requests will be served from there. So ideally you don't see any issue during maintenance maybe a few second downtime to promote the replica as master.
If you either don't set up multiaz with auto failover to true or your elasticache has just one node, you will see downtime during maintenance window.
Refer AWS documentation
How long does it generally take?
Under 60 minutes:
"If a "maintenance" event is scheduled for a given week, it will be initiated and completed at some point during the 60 minute maintenance window you identify."
How often:
Software patching occurs infrequently (typically once every few months) and should seldom require more than a fraction of your maintenance window. If you do not specify a preferred weekly maintenance window when creating your Cache Cluster, a 60 minute default value is assigned.
http://aws.amazon.com/elasticache/faqs/

Amazon Auto Scale is ping ponging since the health checks fail

I have my Auto Scaling Group configured like so:
My Load balancer, when it registers an instance, fails since the health check fails while the instance is still loading.
For what it's worth, my health check is as follows:
I've added the "web-master" instance myself, not part of the auto scale, since what normally happens is that the registering instance fails to add itself to the load balancer, terminates and a new one pops up. This happens countless times until I manually intervene. What am I doing wrong? Is there any way to delay the ELB Health check or at least have it wait till the instance is fully registered?
An instance shouldn't register until its passed its availability tests. Could it be that its your applications spin up time that's the issue, not the instance.
I guess my question is, is your application set up on the port your pinging? I have a lightweight 'healthcheck' app in iis for heathchecks to get around the 'ping pong' effect.

Amazon EC2 autoscaling down with graceful shutdown?

We're looking at using EC2 autoscaling to deal with spikes in load. In our case we want to scale up instances based on an SQS queue size and then down scale with the queue size gets back under control. Each SQS message defines a potentially long running job (sometimes up to 20 minutes each for message) that must complete before the instance can be terminated.
Our software handles the shutdown process gracefully, so issuing sudo service ourapp stop will wait for the app to complete before returning.
My question; when autoscaling starts scaling down it issues a terminate (which apparently is like hitting the power button), will it wait for for our app to completely exit before the instance is 'powered off'?
https://forums.aws.amazon.com/message.jspa?messageID=180674 <- that and other things I've found seem to suggest that it doesn't
On most newer AMI's, the machines are given the equivalent to a 'halt' (or 'shutdown -h now' command so that the services are gracefully shut down. As long as your program plays nicely with the startup/shutdown scripts, you should be fine -- but, if your program takes more than 20 seconds to terminate, you may experience that amazon will kill the instance completely.
Amazon's documentation with regards to their autoscaling doesn't specify the termination process, but, AWS's documentation for ec2 in general does contain about what happens during the termination process -- that the machines is given a 'shutdown' command, and the default shutdown time on most systems is 30 seconds.
In mid 2014 AWS introduced 'lifecycle hooks' which allows for full control of the termination process.
Our high level down scale process is:
Auto Scaling sends a message to a SQS queue with an instance ID
Controller app picks up the message
Controller app issues a 'stop instance' request
Controller app re-queues the SQS message while the instance is stopping
Controller app picks up the message again, checks if the instance has stopped (or re-queues the message to try again later)
Controller app notifies Auto Scaling to 'PROCEED' with the termination
Controller app deletes the message from the SQS queue
More details: http://docs.aws.amazon.com/autoscaling/latest/userguide/lifecycle-hooks.html
use replaceunhealty option in autoscaling.
refer:
http://alestic.com/2011/11/ec2-schedule-instance
particularly see this comment.

Resources