AWS Micro instance vs Large instance: Cost Efficiency - amazon-ec2

It seems for the cost of running 2 large instances, I can run about 40 micro instances. In a distributed system (MongoDB in my case), 40 micro instances sounds a lot faster than 2 large instances, assume the database file is on EBS in both cases.
It this true?

Micro instances may have 97% CPU "steal" time, and they can be unresponsive for several seconds.
In many use cases it's not acceptable to have to wait 15 seconds for a reply. I think small instances are the best deal. I run several of them and I divide the risk of problems and the load among them.
source: personal experience and this article

Related

Behavior on startup times of ec2

I have a use case where we have a very large computation job, which can be broken up into many small units of work fairly efficiently. There could be effectively lets say 1,000 hours of computational work for an m4.large instance. Lets say I wanted the result back within the next 10 minutes, that would mean I would need 6,000 instances to get the job done in time.
So far I have setup AWS batch, I haven't used any more than the 20 m4.large instances your account comes with. I know I can up the amount of instances requested by AWS but I still don't really know much about what the behaviour is if you suddenly try and provision thousands of on-demand instances or if AWS limits how many instances you can use.
So my question is am I able to launch thousands of m4.large instances on-demand? And if so what are sort of times would I be looking at for all instances to get to the Running state.
I have done this many times with ~100 instances but never in the thousands of instances.
STEP 1: Open a support ticket with AWS. You will need to get your account approved, credit checked, etc. My customers are very big companies, so for them the credit and approval process is easy. If you are a little guy, I don't know.
STEP 2: Think thru your VPC design and how you will address that many instances. If is one thing to have 5 instances going thru a NAT Gateway, but a hundred systems will bring Internet connectivity to its knees.
STEP 3: Think thru the networking bandwidth required. Do you need placement groups or very high speed Intranet or Internet connectivity?
STEP 4: Be prepared that you cannot launch all instances with a specific instance type (capacity not available error). Have a selection of instances that you can fall back on.
STEP 5: Create your own software, I use Python, to launch the instances, perform updates, install software, etc. You can then poll the instances using the Boto3 EC2 API to determine when all the instances are running. The length of time for 1,000 instances won't be much different than 1 instance.
Now for the real world. If your job takes 1,000 hours, launching 1,000 instances will not reduce it to 1 hour unless you have a really scalable software design with minimum inter-machine communications required. Once you go beyond 10 systems, networking bandwidth and communications overhead becomes an issue. Even though AWS's resources are huge, launching 1,000 EC2 instances at one time by one customer is not a common launch case.
I would also NOT launch 1,000 instances to get processing down to 10 minutes. It can take 10 minutes for your instances to come online, get updated, synchronize, etc. This means that you will be spending 50% of your budget on waiting time. For really large jobs today we prefer to use Hadoop / Spark where scaling to hundreds of machines is realistic.
You can contact AWS Customer Service to increase your EC2 limits (use the link shown in the Limits section of the EC2 management console). They will verify your use-case.
You might also consider using Spot Pricing to lower your costs. Spot instances take longer to provision.
Sample use-case: Gigaom | Cycle Computing once again showcases Amazon’s high-performance computing potential
There are also services like Spotinst that can help you provision servers at the lowest possible cost.

Are Hadoop and Map/Reduce useful for BIG parallel processes?

I have a superficial understanding of Hadoop and Map/Reduce. I see it can be useful for running many instances of small independent processes. But can I use this infrastructure (with its fault tolerance, scalability and ease of use) to run BIG independent processes?
Let's say I want to run certain analysis of the status of the clients of my company (600), and this analysis requires about 1 min of process, accessing a variety of static data, but the analysis of one client is not related to the others. So now I have 10 hs of centralized processing, but if I can distribute this processing in 20 nodes, I can expect to finish it in about half hour (plus some overhead due to replication of data). And if I can rent 100 nodes in Amazon EC2 for an affordable price, it will be done in about 6 minutes and that will change radically the usability of my analysis.
Is Hadoop the right tool to solve my problem? Can it run big Mapper processes that take 1 min each? If not, where should I look?
Thanks in advance.

AWS RDS Provisioned IOPS really worth it?

As I understand it, RDS Provisioned IOPS is quite expensive compared to standard I/O rate.
In Tokyo region, P-IOPS rate is 0.15$/GB, 0.12$/IOP for standard deployment. (Double the price for Multi-AZ deployment...)
For P-IOPS, the minimum required storage is 100GB, IOP is 1000.
Therefore, starting cost for P-IOPS is 135$ excluding instance pricing.
For my case, using P-IOPS costs about 100X more than using standard I/O rate.
This may be a very subjective question, but please give some opinion.
In the most optimized database for RDS P-IOPS, would the performance be worth the price?
or
The AWS site gives some insights on how P-IOPS can benefit the performance. Is there any actual benchmark?
SELF ANSWER
In addition to the answer that zeroSkillz wrote, I did some more research. However, please note that I am not an expert on reading database benchmarks. Also, the benchmark and the answer was based on EBS.
According to an article written by "Rodrigo Campos", the performance does actually improve significantly.
From 1000 IOPS to 2000 IOPS, the read/write(including random read/write) performance doubles. From what zeroSkillz said, the standard EBS block provices about 100 IOPS. Imagine the improvement on performance when 100 IOPS goes up to 1000 IOPS(which is the minimum IOPS for P-IOPS deployment).
Conclusion
According to the benchmark, the performance/price seems reasonable. For performance critical situations, I guess some people or companies should choose P-IOPS even when they are charged 100X more.
However, if I were a financial consultant in a small or medium business, I would just scale-up(as in CPU, memory) on my RDS instances gradually until the performance/price matches P-IOPS.
Ok. This is a bad question because it doesn't mention the size of the allocated storage or any other details of the setup. We use RDS and it has its pluses and minuses. First- you can't use an ephemeral storage device with RDS. You cant even access the storage device directly when using the RDS service.
That being said - the storage medium for RDS is presumed to be based on a variant of EBS from amazon. Performance for standard IOPS depends on the size of the volume and there are many sources stating that above 100GB storage they start to "stripe" EBS volumes. This provides better average case data access both on read and write.
We run currently about 300GB of storage allocation and can get 2k write IOP and 1k IOP about 85% of the time over a several hour time period. We use datadog to log this so we can actually see. We've seen bursts of up to 4k write IOPs, but nothing sustained like that.
The main symptom we see from an application side is lock contention if the IOPS for writing is not enough. The number and frequency you get of these in your application logs will give you symptoms for exhausting the IOPS of standard RDS. You can also use a service like datadog to monitor the IOPS.
The problem with provisioned IOPS is they assume steady state volumes of writes / reads in order to be cost effective. This is almost never a realistic use case and is the reason Amazon started cloud services to fix. The only assurance you get with P-IOPS is that you'll get a max throughput capability reserved. If don't use it, you pay for it still.
If you're ok with running replicas, we recommend running a read-only replica as a NON-RDS instance, and putting it on a regular EC2 instance. You can get better read-IOPS at a much cheaper price by managing the replica yourself. We even setup replicas outside AWS using stunnel and put SSD drives as the primary block device and we get ridiculous read speeds for our reporting systems - literally 100 times faster than we get from RDS.
I hope this helps give some real world details. In short, in my opinion - unless you must ensure a certain level of throughput capability (or your application will fail) on a constant basis (or at any given point) there are better alternatives to provisioned-IOPS including read-write splitting with read-replicas memcache etc.
So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question. (ie. this is 2nd hand knowledge.)
standard EBS blocks can handle bursty traffic well, but eventually it will taper off to about 100 iops. There were several alternatives that this engineer suggested.
some customers use multiple small EBS blocks and stripe them. This will improve IOPS, and be the most cost effective. You don't need to worry about mirroring because EBS is mirrored behind the scenes.
some customers use the ephemeral storage on the EC2 instance. (or RDS instance) and have multiple slaves to "ensure" durabilty. The ephemeral storage is local storage and much faster than EBS. You can even use SSD provisioned EC2 instances.
some customers will configure the master to use provisioned IOPS, or SSD ephemeral storage, then use standard EBS storage for the slave(s). Expected performance is good, but failover performance is degraded (but still available)
anyway, If you decide to use any of these strategies, I would recheck with amazon to make sure I haven't forgotten any important steps. As I said before, this is 2nd hand knowledge.

Optimal number of Resque workers for maximum performance

I am using Resque for achieving cheap parallelism in my academic research - I split huge tasks into relatively small independent portions and submit them to Resque. These tasks do some heavy stuff, extensively using both database(MongoDB if that's important) and CPU.
All this works extremely slow - for my relatively small portion of dataset 1000 jobs get created and 14 hours of constant working of 2 workers is enough only for finishing ~800 of them. As you might've already suspected, this speed is more than frustrating.
I have a quad-core processor(Core i5 something, not high-end) and apart from Mongo instance and resque workers nothing gets scheduled on CPU for a considerable period of time.
Now that you know my story, all I am asking is - how do I squeeze maximum out of this setting? I believe that 3 workers + 1 mongo instance will quickly fill up all the cores, but at the same time mongo doesn't have to work all the time..

Rapid AWS autoscaling

How do you configure AWS autoscaling to scale up quickly? I've setup an AWS autoscaling group with an ELB. All is working well, except it takes several minutes before the new instances are added and are online. I came across the following in a post about Puppet and autoscaling:
The time to scale can be lowered from several minutes to a few seconds if the AMI you use for a group of nodes is already up to date.
http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
Is this true? Can time to scale be reduced to a few seconds? Would using puppet add any performance boosts?
I also read that smaller instances start quicker than larger ones:
Small Instance 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform with a base install of CentOS 5.3 AMI
Amount of time from launch of instance to availability:
Between 5 and 6 minutes us-east-1c
Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform with a base install of CentOS 5.3 AMI
Amount of time from launch of instance to availability:
Between 11 and 18 minutes us-east-1c
Both were started via command line using Amazons tools.
http://www.philchen.com/2009/04/21/how-long-does-it-take-to-launch-an-amazon-ec2-instance
I note that the article is old and my c1.xlarge instances are certainly not taking 18min to launch. Nonetheless, would configuring an autoscale group with 50 micro instances (with an up scale policy of 100% capacity increase) be more efficient than one with 20 large instances? Or potentially creating two autoscale groups, one of micros for quick launch time and one of large instances to add CPU grunt a few minutes later? All else being equal, how much quicker does a t1.micro come online than a c1.xlarge?
you can increase or decrease the time of reaction for an autoscaller by playing with
"--cooldown" value (in seconds).
regarding the types of instances to be used, this is mostly based on the application type and a decision on this topic should be taken after close performance monitor and production tuning.
The time to scale can be lowered from several minutes to a few seconds
if the AMI you use for a group of nodes is already up to date. This
way, when Puppet runs on boot, it has to do very little, if anything,
to configure the instance with the node’s assigned role.
The advice here is talking about having your AMI (The snapshot of your operating system) as up to date as possible. This way, when auto scale brings up a new machine, Puppet doesn't have to install lots of software like it normally would on a blank AMI, it may just need to pull some updated application files.
Depending on how much work your Puppet scripts do (apt-get install, compiling software, etc) this could save you 5-20 minutes.
The two other factors you have to worry about are:
How long it takes your load balancer to determine you need more resources (e.g a policy that dictates "new machines should be added when CPU is above 90% for more then 5 minutes" would be less responsive and more likely to lead to timeouts compared to "new machines should be added when CPU is above 60% for more then 1 minute")
How long it takes to provision a new EC2 instance (smaller Instance Types tend to take shorted times to provision)
How soon ASG responds would depend on 3 things:
1. Step - how much to increase by % or fixed number - a large step - you can rapidly increase. ASG will launch the entire Step in one go
2. Cooldown Period - This applies 'how soon' the next increase can happen. If the previous increase step is still within the defined cooldown period (seconds), ASG will wait and not take action for next increase yet. Having a small cooldown period will enable next Step quicker.
3 AMI type- how much time a AMI takes to launch, this depends on type of AMI - many factors come into play. All things equal Fully Baked AMIs launch much faster

Resources