How do you configure AWS autoscaling to scale up quickly? I've setup an AWS autoscaling group with an ELB. All is working well, except it takes several minutes before the new instances are added and are online. I came across the following in a post about Puppet and autoscaling:
The time to scale can be lowered from several minutes to a few seconds if the AMI you use for a group of nodes is already up to date.
http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
Is this true? Can time to scale be reduced to a few seconds? Would using puppet add any performance boosts?
I also read that smaller instances start quicker than larger ones:
Small Instance 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform with a base install of CentOS 5.3 AMI
Amount of time from launch of instance to availability:
Between 5 and 6 minutes us-east-1c
Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform with a base install of CentOS 5.3 AMI
Amount of time from launch of instance to availability:
Between 11 and 18 minutes us-east-1c
Both were started via command line using Amazons tools.
http://www.philchen.com/2009/04/21/how-long-does-it-take-to-launch-an-amazon-ec2-instance
I note that the article is old and my c1.xlarge instances are certainly not taking 18min to launch. Nonetheless, would configuring an autoscale group with 50 micro instances (with an up scale policy of 100% capacity increase) be more efficient than one with 20 large instances? Or potentially creating two autoscale groups, one of micros for quick launch time and one of large instances to add CPU grunt a few minutes later? All else being equal, how much quicker does a t1.micro come online than a c1.xlarge?
you can increase or decrease the time of reaction for an autoscaller by playing with
"--cooldown" value (in seconds).
regarding the types of instances to be used, this is mostly based on the application type and a decision on this topic should be taken after close performance monitor and production tuning.
The time to scale can be lowered from several minutes to a few seconds
if the AMI you use for a group of nodes is already up to date. This
way, when Puppet runs on boot, it has to do very little, if anything,
to configure the instance with the node’s assigned role.
The advice here is talking about having your AMI (The snapshot of your operating system) as up to date as possible. This way, when auto scale brings up a new machine, Puppet doesn't have to install lots of software like it normally would on a blank AMI, it may just need to pull some updated application files.
Depending on how much work your Puppet scripts do (apt-get install, compiling software, etc) this could save you 5-20 minutes.
The two other factors you have to worry about are:
How long it takes your load balancer to determine you need more resources (e.g a policy that dictates "new machines should be added when CPU is above 90% for more then 5 minutes" would be less responsive and more likely to lead to timeouts compared to "new machines should be added when CPU is above 60% for more then 1 minute")
How long it takes to provision a new EC2 instance (smaller Instance Types tend to take shorted times to provision)
How soon ASG responds would depend on 3 things:
1. Step - how much to increase by % or fixed number - a large step - you can rapidly increase. ASG will launch the entire Step in one go
2. Cooldown Period - This applies 'how soon' the next increase can happen. If the previous increase step is still within the defined cooldown period (seconds), ASG will wait and not take action for next increase yet. Having a small cooldown period will enable next Step quicker.
3 AMI type- how much time a AMI takes to launch, this depends on type of AMI - many factors come into play. All things equal Fully Baked AMIs launch much faster
Related
I have a use case where we have a very large computation job, which can be broken up into many small units of work fairly efficiently. There could be effectively lets say 1,000 hours of computational work for an m4.large instance. Lets say I wanted the result back within the next 10 minutes, that would mean I would need 6,000 instances to get the job done in time.
So far I have setup AWS batch, I haven't used any more than the 20 m4.large instances your account comes with. I know I can up the amount of instances requested by AWS but I still don't really know much about what the behaviour is if you suddenly try and provision thousands of on-demand instances or if AWS limits how many instances you can use.
So my question is am I able to launch thousands of m4.large instances on-demand? And if so what are sort of times would I be looking at for all instances to get to the Running state.
I have done this many times with ~100 instances but never in the thousands of instances.
STEP 1: Open a support ticket with AWS. You will need to get your account approved, credit checked, etc. My customers are very big companies, so for them the credit and approval process is easy. If you are a little guy, I don't know.
STEP 2: Think thru your VPC design and how you will address that many instances. If is one thing to have 5 instances going thru a NAT Gateway, but a hundred systems will bring Internet connectivity to its knees.
STEP 3: Think thru the networking bandwidth required. Do you need placement groups or very high speed Intranet or Internet connectivity?
STEP 4: Be prepared that you cannot launch all instances with a specific instance type (capacity not available error). Have a selection of instances that you can fall back on.
STEP 5: Create your own software, I use Python, to launch the instances, perform updates, install software, etc. You can then poll the instances using the Boto3 EC2 API to determine when all the instances are running. The length of time for 1,000 instances won't be much different than 1 instance.
Now for the real world. If your job takes 1,000 hours, launching 1,000 instances will not reduce it to 1 hour unless you have a really scalable software design with minimum inter-machine communications required. Once you go beyond 10 systems, networking bandwidth and communications overhead becomes an issue. Even though AWS's resources are huge, launching 1,000 EC2 instances at one time by one customer is not a common launch case.
I would also NOT launch 1,000 instances to get processing down to 10 minutes. It can take 10 minutes for your instances to come online, get updated, synchronize, etc. This means that you will be spending 50% of your budget on waiting time. For really large jobs today we prefer to use Hadoop / Spark where scaling to hundreds of machines is realistic.
You can contact AWS Customer Service to increase your EC2 limits (use the link shown in the Limits section of the EC2 management console). They will verify your use-case.
You might also consider using Spot Pricing to lower your costs. Spot instances take longer to provision.
Sample use-case: Gigaom | Cycle Computing once again showcases Amazon’s high-performance computing potential
There are also services like Spotinst that can help you provision servers at the lowest possible cost.
What Amazon EC2 Instance Types to choose for an application that only receive json, transform, save to database and return a json.
Java(Spring) + PostgreSQL
Expected req/sec 10k.
Your application is CPU bound application and you should choose compute optimized instance, C4 is the latest generation instances in the compute optimized instances.
I had similar application requirement and with c4.xlarge , i could get 40k/min on a single server within SLA of 10 ms for each request. you can also benchmark your application by running a stress test on different types of C4 generation instances.
you must check out https://aws.amazon.com/ec2/instance-types/ doc by AWS on different types of instances and their use cases.
you can also check the CPU usage on your instance by looking into the cloud-watch metrics or running the top command on your linux instance.
Make sure that your instance is not having more than 75% CPU
utilization
You can start with smaller instance and then gradually increase to large server in C4 category, if you see CPU utilization is becoming the bottleneck.This is how i got the perfect instance type for my application , keeping the SLA within 10 ms on server time.
P.S :- in my case DB was also deployed on the same server , so throughput was less , it wil increase if you have DB server installed on other server.
let me know if you need any other info.
Let's say that every request requires 20ms of CPU processing time (thus not taking into account the waits between I/O operations), then each core will be able to process around 50 requests per second. In order to process 10k request per seconds you will need 200 cores, this can be achieved with 16 VCPU with 16 cores each.
Having said that you can then select the right instance for your needs using ec2 selector tool, for instance:
these are all the instance types with 16X16 cores for less than 10k$/y
if otherwise, you're fine with "just" 64 cores in total then take a look at these
If you have other constraints or if my assumptions weren't correct you can change the filters accordingly and choose the best type that suits your needs.
At the end of last month, I started experimenting with Amazon EC2. I launched one or two t2 micro instances, to experiment for free under the free tier.
However, I have quickly noticed on my billing dashboard that my services usage was increasing fast, and after a few dayswas forecast to exceed the free tier limitations by the end of the month. Since I don't need to maintain an 24/7 online presence at the moment and mostly use my instances to experiment, I have terminated one and stopped another. I have even detached the volume of the stopped instance.
All I have left are:
One stopped micro instance
One detached volume
Two snapshots of that volume
One AMI that I had used to move my instance from one region to another in the beginning
As far as I understand it, those more or less "idle" resources, at least not highly active ones. And yet, in my billing dashboard, in the row EC2 - Linux, the month-to-date usage keeps increasing, at the rate of more than one hour of usage per real-time hour.
Before I detached the volume, the usage would increase by 14 hours in only 3 hours. Now that I have detached it, it slowed down a bit, but still increased by 16 hours in the last 13 hours. All that, again, without actively running anything!
I am aware that the tally of the usage isn't strictly limited to running instances, but to associated resources as well. But still, it seems very high to me. I don't even dare to test my app anymore.
I would like to know if such an increase is normal, or if there may be something wrong with my account and how I configured my instances. If it is normal, any indications as to what actions I could take to reduce this increase would be very welcome!
Note that I did contact the Amazon support several days ago and didn't get a single reply, this is why I'm turning to here.
Thanks in advance!
Edit: Solved. I did have an instance running in another region, probably by mistake because I had never interacted directly with that region. See comments for a script to systematically check all regions and avoid this kind of stupid mistake.
I have setup AWS AutoScaling as following:
1) created a Load Balancer and registered one instance with it;
2) added Health Checks to the ELB;
3) added 2 Alarms:
- CPU Usage -> 60% for 60s, spin up 1 instance;
- CPU usage < 40% for 120s, spin down 1 instance;
4) wrote a jMeter script to send traffic to the website in question: 250 threads, 200 seconds ramp up time, loop count 5.
What I am seeing was very strange.
I expect the CPU usage to shoot up with the higher number of users. But instead the CPU usage stays between 20-30% (which is why the new instance never fires up) and running instance starts throwing timeout errors once it reaches anything more than 100 users.
I am at a loss to understand why CPU usage is so low when the website is in fact timing out.
Ideas?
This could be a problem with the ELB. The ELB does not scale very quickly, it takes a consistent amount of traffic to the ELB to let amazon know you need a bigger one. If you just hit it really hard all at once that does not help it scale. So the ELB could be having problems handling all the connections.
Is this SSL? Are you doing SSL on the ELB? That would add overhead to an underscaled ELB as well.
I would honestly recommend not using ELB at all. haproxy is a much better product and much faster in most cases. I can elaborate if needed, but just look at how Amazon handles the cname vs what you can do with haproxy...
It sounds like you are testing AutoScaling to ensure it will work for your needs. As a first pass to simply see if AS will launch a new instance, try reducing your CPU up check to trigger at 25%. I realize this is a lot lower than you are hoping to use moving forward, but it will help validate that your initial configuration is working.
As a second step, you should take a look at your application and see if CPU is the best metric to have AS monitor for scaling. It is possible that you have a bottleneck somewhere else in your app that may not necessarily be CPU related (web server tuning, memory, databases, storage, etc). You didn't mention what type of content you're serving out; is it static or generated by an interpreter (like PHP or something else)? You could also send your own custom metric data into CloudWatch and use this metric to trigger the scaling.
You may also want to time how long it takes for an instance to be ready to serve traffic from a cold start. If it takes longer than 60 seconds, you may want to adjust your monitoring threshold time appropriately (or set cool down periods). As chantheman pointed out, it can take some time for the ELB to register the instance as well (and a longer amount of time if the new instance is in a different AZ).
I hope all of this helps.
What we discovered is that when you are using autoscale on t2 instances, and under heavy load, those instances will run out of CPU credits and then they are limited to 20% of CPU (from the monitoring point of view, internal htop is still 100%). Internally they are at maximum load.
This sends false metric to Autoscaling and news instances will not fire.
You need to change metric or develop you own or move to m instances.
It seems for the cost of running 2 large instances, I can run about 40 micro instances. In a distributed system (MongoDB in my case), 40 micro instances sounds a lot faster than 2 large instances, assume the database file is on EBS in both cases.
It this true?
Micro instances may have 97% CPU "steal" time, and they can be unresponsive for several seconds.
In many use cases it's not acceptable to have to wait 15 seconds for a reply. I think small instances are the best deal. I run several of them and I divide the risk of problems and the load among them.
source: personal experience and this article