Choose Amazon EC2 Instance Types - performance

What Amazon EC2 Instance Types to choose for an application that only receive json, transform, save to database and return a json.
Java(Spring) + PostgreSQL
Expected req/sec 10k.

Your application is CPU bound application and you should choose compute optimized instance, C4 is the latest generation instances in the compute optimized instances.
I had similar application requirement and with c4.xlarge , i could get 40k/min on a single server within SLA of 10 ms for each request. you can also benchmark your application by running a stress test on different types of C4 generation instances.
you must check out https://aws.amazon.com/ec2/instance-types/ doc by AWS on different types of instances and their use cases.
you can also check the CPU usage on your instance by looking into the cloud-watch metrics or running the top command on your linux instance.
Make sure that your instance is not having more than 75% CPU
utilization
You can start with smaller instance and then gradually increase to large server in C4 category, if you see CPU utilization is becoming the bottleneck.This is how i got the perfect instance type for my application , keeping the SLA within 10 ms on server time.
P.S :- in my case DB was also deployed on the same server , so throughput was less , it wil increase if you have DB server installed on other server.
let me know if you need any other info.

Let's say that every request requires 20ms of CPU processing time (thus not taking into account the waits between I/O operations), then each core will be able to process around 50 requests per second. In order to process 10k request per seconds you will need 200 cores, this can be achieved with 16 VCPU with 16 cores each.
Having said that you can then select the right instance for your needs using ec2 selector tool, for instance:
these are all the instance types with 16X16 cores for less than 10k$/y
if otherwise, you're fine with "just" 64 cores in total then take a look at these
If you have other constraints or if my assumptions weren't correct you can change the filters accordingly and choose the best type that suits your needs.

Related

EC2 host type for a DynamoDB batchWrite call

I have a requirement to bulk upload an excel sheet to a DynamoDB table and the maximum number of rows are 200,000. The website for bulk upload will be used less frequently, so we can assume there are only 1 - 2 bulk uploads being processed at a given time. In the backend, I am using Apache POI API to parse the excel sheet into DynamoDB Items.
Because we can only send up to 25 items in a batchWriteItem call, the currently latency is around 15 minutes (900 seconds) to completely upload all the 200,000 items. Hence I am planning to implement multi threading to execute multiple batchWriteItem API calls in parallel. Can you help me understand which EC2 host types are best suited for multi-threading for this purpose.
Any references will be really helpful.
Normally, multi-threading would be helped by using an Instance Type that has multiple CPUs.
However, you are describing behaviour that is waiting on network rather than CPU. Therefore, it is likely that the operation you describe is not being heavily impacted by CPU Utilization.
The best way to answer your question is to recommend that you experiment with different instance types to find the one that is best for your application's combination of needs:
Pick an instance family (eg m5) and try a few different sizes
Compare this against another family (eg c5) to see whether the improved performance is worth the extra cost
Monitor the application to find the bottleneck, which would either be RAM, CPU, Network or Disk access
Please note that smaller instances have less Network bandwidth, so you might need to choose a larger instance type to avoid being throttled on network bandwidth. This might result in excess CPU that isn't being fully utilized.

How to measure resource usage on partitionlevel in Service Fabric?

With Service Fabric we get the tools to create custom metrics and capacities. This way we can all make our own resource models that the resource balancer uses to execute on runtime. I would like to monitor and use physical resources such as: memory, cpu and disk usage. This works fine as long as we keep using the default load.
But Load is not static for a service/actor, so I would like to use the built-in Dynamic load reporting. This is where I run into a problem, ReportLoad works on the level of partitions. However partitions are all within the same process on a Node. All methods for monitoring physical resources I found are using process as the smallest unit of measurement, such as PerformanceCounter. If this value would be used there could be hunderds of partitions reporting the same load and a load which is not representative of the partition.
So the question is: How can the resource usage be measured on partition level?
Not only are service instances and replicas hosted in the same process, but they also share a thread pool by default in .NET! Every time you create a new service instance, the platform actually just creates an instance of your service class (the one that derives from StatefulService or StatelessService) inside the host process. This is great because it's fast, cheap, and you can pack a ton of services into a single host process and on to each VM or machine in your cluster.
But it also means resources are shared, so how do you know how much each replica of each partition is using?
The answer is that you report load on virtual resources rather than physical resources. The idea is that you, the service author, can keep track of some measurement about your service, and you formulate metrics from that information. Here is a simple example of a virtual resource that's based on physical resources:
Suppose you have a web service. You run a load test on your web service and you determine the maximum requests per second it can handle on various hardware profiles (using Azure VM sizes and completely made-up numbers as an example):
A2: 500 RPS
D2: 1000 RPS
D4: 1500 RPS
Now when you create your cluster, you set your capacities accordingly based on the hardware profiles you're using. So if you have a cluster of D2s, each node would define a capacity of 1000 RPS.
Then each instance (or replica if stateful) of your web service reports an average RPS value. This is a virtual resource that you can easily calculate per instance/replica. It corresponds to some hardware profile, even though you're not reporting CPU, network, memory, etc. directly. You can apply this to anything that you can measure about your services, e.g., queue length, concurrent user count, etc.
If you don't want to define a capacity as specific as requests per second, you can take a more general approach by defining physical-ish capacities for common resources, like memory or disk usage. But what you're really doing here is defining usable memory and disk for your services rather than total available. In your services you can keep track of how much of each capacity each instance/replica uses. But it's not a total value, it's just the stuff you know about. So for example if you're keeping track of data stored in memory, it wouldn't necessarily include runtime overhead, temporary heap allocations, etc.
I have an example of this approach in a Reliable Collection wrapper I wrote that reports load metrics strictly on the amount of data you store by counting bytes: https://github.com/vturecek/metric-reliable-collections. It doesn't report total memory usage, so you have to come up with a reasonable estimate of how much overhead you need and define your capacities accordingly, but at the same time by not reporting temporary heap allocations and other transient memory usage, the metrics that are reported should be much smoother and more representative of the actual data you're storing (you don't necessarily want to re-balance the cluster simply because the .NET GC hasn't run yet, for example).

Understanding RESTful Web Service stress test results

I'm trying to stress-test my Spring RESTful Web Service.
I run my Tomcat server on a Intel Core 2 Duo notebook, 4 GB of RAM. I know it's not a real server machine, but i've only this and it's only for study purpose.
For the test, I run JMeter on a remote machine and communication is through a private WLAN with a central wireless router. I prefer to test this from wireless connection because it would be accessed from mobile clients. With JMeter i run a group of 50 threads, starting one thread per second, then after 50 seconds all threads are running. Each thread sends repeatedly an HTTP request to the server, containing a small JSON object to be processed, and sleeping on each iteration for an amount of time equals to the sum of a 100 milliseconds constant delay and a random value of gaussian distribution with standard deviation of 100 milliseconds. I use some JMeter plugins for graphs.
Here are the results:
I can't figure out why mi hits per seconds doesn't pass the 100 threshold (in the graph they are multiplied per 10), beacuse with this configuration it should have been higher than this value (50 thread sending at least three times would generate 150 hit/sec). I don't get any error message from server, and all seems to work well. I've tried even more and more configurations, but i can't get more than 100 hit/sec.
Why?
[EDIT] Many time I notice a substantial performance degradation from some point on without any visible cause: no error response messages on client, only ok http response messages, and all seems to work well on the server too, but looking at the reports:
As you can notice, something happens between 01:54 and 02:14: hits per sec decreases, and response time increase, okay it could be a server overload, but what about the cpu decreasing? This is not compatible with the congestion hypothesis.
I want to notice that you've chosen very well which rows to display on Composite Graph. It's enough to make some conclusions:
Make note that Hits Per Second perfectly correlates with CPU usage. This means you have "CPU-bound" system, and the maximum performance is mostly limited by CPU. This is very important to remember: server resources spent by Hits, not active users. You may disable your sleep timers at all and still will receive the same 80-90 Hits/s.
The maximum level of CPU is somewhere at 80%, so I assume you run Windows OS (Win7?) on your machine. I used to see that it's impossible to achieve 100% CPU utilization on Windows machine, it just does not allow to spend the last 20%. And if you achieved the maximum, then you see your installation's capacity limit. It just has not enough CPU resources to serve more requests. To fight this bottleneck you should either give more CPU (use another server with higher level CPU hardware), or configure OS to let you use up to 100% (I don't know if it is applicable), or optimize your system (code, OS settings) to spend less CPU to serve single request.
For the second graph I'd suppose something is downloaded via the router, or something happens on JMeter machine. "Something happens" means some task is running. This may be your friend who just wanted to do some "grep error.log", or some scheduled task is running. To pin this down you should look at the router resources and jmeter machine resources at the degradation situation. There must be a process that swallows CPU/DISK/Network.

Rapid AWS autoscaling

How do you configure AWS autoscaling to scale up quickly? I've setup an AWS autoscaling group with an ELB. All is working well, except it takes several minutes before the new instances are added and are online. I came across the following in a post about Puppet and autoscaling:
The time to scale can be lowered from several minutes to a few seconds if the AMI you use for a group of nodes is already up to date.
http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
Is this true? Can time to scale be reduced to a few seconds? Would using puppet add any performance boosts?
I also read that smaller instances start quicker than larger ones:
Small Instance 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform with a base install of CentOS 5.3 AMI
Amount of time from launch of instance to availability:
Between 5 and 6 minutes us-east-1c
Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform with a base install of CentOS 5.3 AMI
Amount of time from launch of instance to availability:
Between 11 and 18 minutes us-east-1c
Both were started via command line using Amazons tools.
http://www.philchen.com/2009/04/21/how-long-does-it-take-to-launch-an-amazon-ec2-instance
I note that the article is old and my c1.xlarge instances are certainly not taking 18min to launch. Nonetheless, would configuring an autoscale group with 50 micro instances (with an up scale policy of 100% capacity increase) be more efficient than one with 20 large instances? Or potentially creating two autoscale groups, one of micros for quick launch time and one of large instances to add CPU grunt a few minutes later? All else being equal, how much quicker does a t1.micro come online than a c1.xlarge?
you can increase or decrease the time of reaction for an autoscaller by playing with
"--cooldown" value (in seconds).
regarding the types of instances to be used, this is mostly based on the application type and a decision on this topic should be taken after close performance monitor and production tuning.
The time to scale can be lowered from several minutes to a few seconds
if the AMI you use for a group of nodes is already up to date. This
way, when Puppet runs on boot, it has to do very little, if anything,
to configure the instance with the node’s assigned role.
The advice here is talking about having your AMI (The snapshot of your operating system) as up to date as possible. This way, when auto scale brings up a new machine, Puppet doesn't have to install lots of software like it normally would on a blank AMI, it may just need to pull some updated application files.
Depending on how much work your Puppet scripts do (apt-get install, compiling software, etc) this could save you 5-20 minutes.
The two other factors you have to worry about are:
How long it takes your load balancer to determine you need more resources (e.g a policy that dictates "new machines should be added when CPU is above 90% for more then 5 minutes" would be less responsive and more likely to lead to timeouts compared to "new machines should be added when CPU is above 60% for more then 1 minute")
How long it takes to provision a new EC2 instance (smaller Instance Types tend to take shorted times to provision)
How soon ASG responds would depend on 3 things:
1. Step - how much to increase by % or fixed number - a large step - you can rapidly increase. ASG will launch the entire Step in one go
2. Cooldown Period - This applies 'how soon' the next increase can happen. If the previous increase step is still within the defined cooldown period (seconds), ASG will wait and not take action for next increase yet. Having a small cooldown period will enable next Step quicker.
3 AMI type- how much time a AMI takes to launch, this depends on type of AMI - many factors come into play. All things equal Fully Baked AMIs launch much faster

AWS AutoScaling not working / CPU Utilization stays sub 30%

I have setup AWS AutoScaling as following:
1) created a Load Balancer and registered one instance with it;
2) added Health Checks to the ELB;
3) added 2 Alarms:
- CPU Usage -> 60% for 60s, spin up 1 instance;
- CPU usage < 40% for 120s, spin down 1 instance;
4) wrote a jMeter script to send traffic to the website in question: 250 threads, 200 seconds ramp up time, loop count 5.
What I am seeing was very strange.
I expect the CPU usage to shoot up with the higher number of users. But instead the CPU usage stays between 20-30% (which is why the new instance never fires up) and running instance starts throwing timeout errors once it reaches anything more than 100 users.
I am at a loss to understand why CPU usage is so low when the website is in fact timing out.
Ideas?
This could be a problem with the ELB. The ELB does not scale very quickly, it takes a consistent amount of traffic to the ELB to let amazon know you need a bigger one. If you just hit it really hard all at once that does not help it scale. So the ELB could be having problems handling all the connections.
Is this SSL? Are you doing SSL on the ELB? That would add overhead to an underscaled ELB as well.
I would honestly recommend not using ELB at all. haproxy is a much better product and much faster in most cases. I can elaborate if needed, but just look at how Amazon handles the cname vs what you can do with haproxy...
It sounds like you are testing AutoScaling to ensure it will work for your needs. As a first pass to simply see if AS will launch a new instance, try reducing your CPU up check to trigger at 25%. I realize this is a lot lower than you are hoping to use moving forward, but it will help validate that your initial configuration is working.
As a second step, you should take a look at your application and see if CPU is the best metric to have AS monitor for scaling. It is possible that you have a bottleneck somewhere else in your app that may not necessarily be CPU related (web server tuning, memory, databases, storage, etc). You didn't mention what type of content you're serving out; is it static or generated by an interpreter (like PHP or something else)? You could also send your own custom metric data into CloudWatch and use this metric to trigger the scaling.
You may also want to time how long it takes for an instance to be ready to serve traffic from a cold start. If it takes longer than 60 seconds, you may want to adjust your monitoring threshold time appropriately (or set cool down periods). As chantheman pointed out, it can take some time for the ELB to register the instance as well (and a longer amount of time if the new instance is in a different AZ).
I hope all of this helps.
What we discovered is that when you are using autoscale on t2 instances, and under heavy load, those instances will run out of CPU credits and then they are limited to 20% of CPU (from the monitoring point of view, internal htop is still 100%). Internally they are at maximum load.
This sends false metric to Autoscaling and news instances will not fire.
You need to change metric or develop you own or move to m instances.

Resources