Rule of thumb for determining web server scale up and down? - performance

What's a good rule of thumb for determining whether to scale up or down the number of cloud based web servers I have running? Are there any other metrics besides request execution time, processor utilization, available memory, and requests per second that should be monitored for this purpose? Should the weighted average, standard deviation or some other calculation be used for determining scale up or down? And finally, are there any particular values that are best for determining when to add or reduce server instances?

This question of dynamically allocating compute instances brings back memories of my control systems classes in engineering school. It seems we should be able to apply classical digital control systems algorithms (think PID loops and Z-transforms) to scaling servers. Spinning up a server instance is analogous to moving an engine throttle's stepper motor one notch to increase the fuel and oxygen rate in response to increased load. Respond too slowly and the performance is sluggish (overdamped), too quickly and the system becomes unstable (underdamped).
In both the compute and physical domains, the goal is to have resources match load. The good news is that compute systems are among the easier control systems to deal with, since having too many resources doesn't cause instability; it just costs money, similar to generating electricity in a system with a resistor bank to burn off excess.
It’s great to see how the fundamentals keep coming around again! We learned all that for a reason.

Your question is a hot research area right now. However, the web-server utilization can be automated by the cloud providers in different ways. For the details, how it works? Which metrics effects the scale up and down: you can glance at this paper.
Amazon has announced Elastic Beanstalk, which lets you deploy an application to Amazon’s EC2 (Elastic Compute Cloud) and have it scale up or down, by launching or terminating server instances, according to demand. There is no additional cost for using Elastic Beanstalk; you are charged for the instances you use.
Also, you can check Auto Scaling which Amazon AWS offers.
Auto Scaling allows you to scale your
Amazon EC2 capacity automatically up
or down according to conditions you
define. With Auto Scaling, you can
ensure that the number of Amazon EC2
instances you’re using increases
seamlessly during demand spikes to
maintain performance and decreases
automatically during demand lulls to
minimize costs. Auto Scaling is
particularly well suited for
applications that experience hourly,
daily, or weekly variability in usage.
Auto Scaling is enabled by Amazon
CloudWatch and available at no
additional charge beyond Amazon
CloudWatch fees.
I recommend you to read the details from Amazon AWS to dig how their system utilize scale up and down for web servers.

Related

For long running important apps what Aws EC2 instance type select?

I have been checking all EC2 instances and.. from all types I'm debating between choosing -reserved instance or dedicated instance-
Spot and on demand doesn't meet the requirement I think.
I don't have more information about the type of app to be run.
It sounds like you are concerned with the availability (or up time) of different EC2 instances. The difference between reserved, dedicated, spot and on-demand has more to do with cost than availability. AFAIK AWS does not guarantee different levels of availability for different EC2 instance types or cost structures.
Dedicated instances are ones that won't run on the same hardware as other instances, but dedicated instances can run on the same hardware as other non-dedicated instances in the same account. Dedicated instances shouldn't have any affect on availability.
The other options, (reserved, spot and on-demand) are just different cost structures. They don't affect performance or availability.
AWS advertises 99.99% uptime. This is from their SLA:
AWS will use commercially reasonable efforts to make the Included Services each available for each AWS region with a Monthly Uptime Percentage of at least 99.99%, in each case during any monthly billing cycle (the “Service Commitment”). In the event any of the Included Services do not meet the Service Commitment, you will be eligible to receive a Service Credit as described below.
So, any instance will be appropriate for important long running apps, you just need to select the instance type that is large enough and pick the right cost structure. You probably don't need dedicated instances, unless you have determined that steel time is going to impact your app's performance.
I dont think its Instance Types you need to worry about - they are just different sizes of machines that have different ratios of CPU/RAM/IO. High-Availability can only be achieved by designing it into both the application and infrastructure. Reservation Types (spot/on-demand/dedicated) are more about cost optimisation based on how long and much capacity you need.
Spot instances are transient - you can reserve a spot for upto 6 hours then it will be terminated. Probably not a good fit, but it depends on how long is "long running"? You get a significant discount for using spot instances if you can live with the limitations.
On-Demand and Reserved are basically the same service, but on-demand is pay-as-you-go vs reserved instance is a fixed length contract and upfront payments. Reserved instances give you the best value for money over the long term, but on-demand gives you flexibility to change instance size or turn off the instance to save money.
Going shared vs Dedicated is generally a compliance question as it gets a lot more expensive to reserve your own hardware to run just your own instances.

Distributed calculation on Cloud Foundry with help of auto-scaling

I have some computation intensive and long-running task. It can easily be split into sub-tasks and also it would be kind of easy to aggregate the results later on. For example Map/Reduce would work well.
I have to solve this on Cloud Foundry and there I want to get advantage from autos-caling, that is creation of additional instances due to high CPU loads. Normally I use Spring boot for developing my cf apps.
Any ideas are welcome of how to divide&conquer in an elastic way on cf. It would be great to have as many instances created as cf would do, without needing to configure the amount of available application instances in the application. Also I need to trigger the creation of instances by loading the CPUs to provoke auto-scaling.
I have to solve this on Cloud Foundry
It sounds like you're on the right track here. The main thing is that you need to write your app so that it can coexist with multiple instances of itself (or perhaps break it into a primary node that coordinates work and multiple worker apps). However you architect the app, being able to scale up instances is critical. You can then simply cf scale to add or remove nodes and increase capacity.
If you wanted to get clever, you could set up a pipeline to run your jobs. Step one would be to scale up the worker nodes of your app, step two would be to schedule the work to run, step three would be to clean up and scale down your nodes.
I'm suggesting this because manual scaling is going to be the simplest path forward (please read on for why).
and there I want to get advantage from autos-caling, that is creation of additional instances due to high CPU loads.
As to autoscaling, I think it's possible but I also think it's making the problem more complicated than it needs to be. Auto scaling by CPU on Cloud Foundry is not as simple as it seems. The way Linux reports CPU usage, you can exceed 100%, it's 100% per CPU core. Pair this with the fact that you may not know how many CPU cores are on your Cells (like if you're using a public CF provider), the fact that the number of cores could change over time (if your provider changes hardware), and that makes it's difficult to know at what point you should scale your application.
If you must autoscale, I would suggest trying to autoscale on some other metric. What metrics are available, will depend on the autoscaler tool you are using. The best would be if you could have some custom metric, then you could use work queue length or something that's relevant to your application. If custom metrics are not supported, you could always hack together your own autoscaler that does work with metrics relevant to your application (you can scale up and down by adjusting the instance cound of your app using the CF API).
You might also be able to hack together a solution based on the metrics that your autoscaler does provide. For example, you could artificially inflate a metric that your autoscaler does support in proportion to the workload you need to process.
You could also just scale up when your work day starts and scale down at the end of the day. It's not dynamic, but it simple and it will get you some efficiency improvements.
Hope that helps!

Google Compute have used CDN?

I already used Google Compute instance, and it's located us-central.
I'm living Taiwan, ping the instance than average time 180~210ms.
Amazon EC2 located Singapore, average time 70~80ms.
I think this difference latency result, depend your server located, right?
So I guessed Google Compute Engine doesn't support CDN, right?
even Amazon ec2 also the same.
Kind Regards,
PinLiang
Google Compute runs code while a CDN delivers content (**C**ontent **D**elivery **N**etwork) so they aren't the same thing. If you get better latency to Amazon EC2 then use that instead but be aware that Google Compute and EC2 work very differently and you wont be able to run the same code on both.
If you want low latency (to Taiwan) compute resources you might want to consider using a Compute Engine instances in the Asia zones, see: 4/14/2014 - Google Cloud Platform expands to Asia
Yes, location and network connectivity will determine your latency. This is not always obvious though. Submarine cables tend to take particular paths. In some cases a geographically closer location may have higher latency.
A CDN is generally used for distributing static files, at lower latency to more users. Cloudfront can use any site as a custom origin.

AWS RDS Provisioned IOPS really worth it?

As I understand it, RDS Provisioned IOPS is quite expensive compared to standard I/O rate.
In Tokyo region, P-IOPS rate is 0.15$/GB, 0.12$/IOP for standard deployment. (Double the price for Multi-AZ deployment...)
For P-IOPS, the minimum required storage is 100GB, IOP is 1000.
Therefore, starting cost for P-IOPS is 135$ excluding instance pricing.
For my case, using P-IOPS costs about 100X more than using standard I/O rate.
This may be a very subjective question, but please give some opinion.
In the most optimized database for RDS P-IOPS, would the performance be worth the price?
or
The AWS site gives some insights on how P-IOPS can benefit the performance. Is there any actual benchmark?
SELF ANSWER
In addition to the answer that zeroSkillz wrote, I did some more research. However, please note that I am not an expert on reading database benchmarks. Also, the benchmark and the answer was based on EBS.
According to an article written by "Rodrigo Campos", the performance does actually improve significantly.
From 1000 IOPS to 2000 IOPS, the read/write(including random read/write) performance doubles. From what zeroSkillz said, the standard EBS block provices about 100 IOPS. Imagine the improvement on performance when 100 IOPS goes up to 1000 IOPS(which is the minimum IOPS for P-IOPS deployment).
Conclusion
According to the benchmark, the performance/price seems reasonable. For performance critical situations, I guess some people or companies should choose P-IOPS even when they are charged 100X more.
However, if I were a financial consultant in a small or medium business, I would just scale-up(as in CPU, memory) on my RDS instances gradually until the performance/price matches P-IOPS.
Ok. This is a bad question because it doesn't mention the size of the allocated storage or any other details of the setup. We use RDS and it has its pluses and minuses. First- you can't use an ephemeral storage device with RDS. You cant even access the storage device directly when using the RDS service.
That being said - the storage medium for RDS is presumed to be based on a variant of EBS from amazon. Performance for standard IOPS depends on the size of the volume and there are many sources stating that above 100GB storage they start to "stripe" EBS volumes. This provides better average case data access both on read and write.
We run currently about 300GB of storage allocation and can get 2k write IOP and 1k IOP about 85% of the time over a several hour time period. We use datadog to log this so we can actually see. We've seen bursts of up to 4k write IOPs, but nothing sustained like that.
The main symptom we see from an application side is lock contention if the IOPS for writing is not enough. The number and frequency you get of these in your application logs will give you symptoms for exhausting the IOPS of standard RDS. You can also use a service like datadog to monitor the IOPS.
The problem with provisioned IOPS is they assume steady state volumes of writes / reads in order to be cost effective. This is almost never a realistic use case and is the reason Amazon started cloud services to fix. The only assurance you get with P-IOPS is that you'll get a max throughput capability reserved. If don't use it, you pay for it still.
If you're ok with running replicas, we recommend running a read-only replica as a NON-RDS instance, and putting it on a regular EC2 instance. You can get better read-IOPS at a much cheaper price by managing the replica yourself. We even setup replicas outside AWS using stunnel and put SSD drives as the primary block device and we get ridiculous read speeds for our reporting systems - literally 100 times faster than we get from RDS.
I hope this helps give some real world details. In short, in my opinion - unless you must ensure a certain level of throughput capability (or your application will fail) on a constant basis (or at any given point) there are better alternatives to provisioned-IOPS including read-write splitting with read-replicas memcache etc.
So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question. (ie. this is 2nd hand knowledge.)
standard EBS blocks can handle bursty traffic well, but eventually it will taper off to about 100 iops. There were several alternatives that this engineer suggested.
some customers use multiple small EBS blocks and stripe them. This will improve IOPS, and be the most cost effective. You don't need to worry about mirroring because EBS is mirrored behind the scenes.
some customers use the ephemeral storage on the EC2 instance. (or RDS instance) and have multiple slaves to "ensure" durabilty. The ephemeral storage is local storage and much faster than EBS. You can even use SSD provisioned EC2 instances.
some customers will configure the master to use provisioned IOPS, or SSD ephemeral storage, then use standard EBS storage for the slave(s). Expected performance is good, but failover performance is degraded (but still available)
anyway, If you decide to use any of these strategies, I would recheck with amazon to make sure I haven't forgotten any important steps. As I said before, this is 2nd hand knowledge.

Capacity planning and estimating costs with Amazon Web Services for 1M users

I'm in the planning stages of estimating server costs for my web application. How can I determine how many Amazon EC2 instances will I need to handle a database backed web application with 1M active users? How should I go about filling out this monthly calculator on Amazon's site?
http://calculator.s3.amazonaws.com/calc5.html
The web application will be somewhat akin to a social networking site. There will be most likely small, but anywhere from 100,000 to 500,000 data transfers from users to the servers on a daily basis.
To get an accurate estimate of your costs, you will have understand the application architecture, usage patterns and how many servers (instances) and storage and data transfer you expect to use.
Take a look at this video: https://www.youtube.com/watch?v=PsEX3W6lHN4&list=PLhr1KZpdzukcAtqFF32cjGUNNT5GOzKQ8 This video might help you understand how to use the calculator and fill up different values in it.
Jin
Capacity planning is yours, it's specific to app so nobody including Amazon can suggest anything on that regards. Regarding cost estimation, yes you can use monthly calculator. Only thing I could suggest is that when you do your capacity planning make sure that you do your homework like which AWS service you are going to use. For each service you might want to find out unit of measure used for pricing. Once you know that you should do your capacity planning accordingly to find out how much you are projecting to use for a given UOM of a given service on monthly basis. One exception to that is, as you can reserve instances for 3-5 years with upfront fees so you might want to spread that cost across 3 or 5 years based on your choice.

Resources