I'm using small size ec2.
its noticeably slower than my less than $800 home linux machine.
(about average machine purchased 6month ago)
I don't know cpu or hard-disk is the bottleneck.
Wonder if there's a way to tell which.
yes, if you want to monitor your EC2 instance, consider using Amazon's cloudwatch ( http://aws.amazon.com/cloudwatch/ ). This service can monitor all your instance's resources, such as CPU utilization, memory usage, network latency, and request counts. It's also free in the amazon free tier.
If you're looking for more detailed monitoring, consider serverdensity service ( http://www.serverdensity.com/cloud-monitoring/ ). They can monitor software installed on the server itself, such as apache service
Related
I did a test performance for my server(1 ECU), but My server only arrived 1000 users in testing, how many ECU I need for 15000 users?
The ECU (Elastic Compute Unit) was a unit of measure designed to provide a relative measure of performance between Amazon EC2 instance types. For example, an m1.small instance had 1 ECU, an m1.large had 2 ECUs, etc.
However, it is no longer possible to summarize the power of an instance in a single number. Some instances have more RAM, some have more CPUs or more powerful CPUs, GPUs, enhanced networking and even burst capabilities.
Therefore, the ECU has slowly disappeared from AWS services and documentation. It can still be viewed as an optional column in the Amazon EC2 Launch Instance console.
The ECU is definitely not a good measure of "the number of users" that a system can support. The number of users that a system can support are totally dependent upon the application architecture and its system requirements. When testing the number of users a system can support, closely monitor all system components (eg CPU load, RAM utilization, disk queues) to identify the bottleneck. You can then try to modify the application or improve the bottleneck to provide better application performance.
I am interested in understanding the way in which the hardware resources (CPU, disk, network, etc.) of an AWS physical server is shared between different applications. Do people have experiences about inexplicable performance changes in services running on AWS that you have successfully attributed to another application sharing the physical resources? If so, how did you go about debugging this?
In particular, I am interested in more complicated interactions between the resources, such as CPU->Memory bandwidth. If you run 15 VMs on a single machine, you will surely have worse performance than if you ran 2 VMs.
Perhaps this is a more general question about Xen virtualization, but I don't know if there is some kind of AWS magic happening under the hood that I don't know about.
I am not sure if this is the right forum for this kind of question; if not, it would be helpful if you could point me towards a resource or another forum.
Amazon EC2 instances are not susceptible to "noisy neighbour" problems.
Based upon the Instance Type selected, the EC2 instance receives CPU, Memory and (for some instance types) locally attached disk storage. These resources are dedicated to the instance and will not be impacted by other users nor other virtual machines. (An exception to this is the t1 and t2 instance types.)
Specifically:
The instance is allocated a number of vCPUs. These are provided to the instance and no other instance can use these vCPUs (see note about t1 and t2 below). The EC2 Instance Type page defines a vCPU as:
Each vCPU is a hyperthread of an Intel Xeon core for M4, M3, C4, C3, R3, HS1, G2, I2, and D2.
The instance is allocated an amount of RAM. No other instance can use this RAM. There is no oversubscription of CPU nor RAM.
The instance might be allocated locally-attached disk storage, known as Instance Store or Ephemeral Storage. This disk storage does not persist when the instance is Stopped or Terminated, so only store temporary data or data that is replicated elsewhere.
The instance is allocated network bandwidth that is dedicated to that instance. No other instance can impact this network bandwidth. The network performance is based upon the selected instance type. Basically, larger instances receive more network performance.
None of the above factors are impacted by other instances (virtual machines) running on the same host.
t1 and t2 instance types
An exception to the above statement are:
t1.micro instances "provide a small amount of consistent CPU resources and allow you to increase CPU capacity in short bursts when additional cycles are available".
t2 instances provide burst capacity based upon a system of CPU Credits. CPU Credits are earned at a constant rate depending upon instance type, and these credits can be used to burst the CPU when necessary.
For both these instance types, I would assume that this burst capacity is shared between instances, so it is possible that CPU burst might be impacted by other instances also wishing to burst. The t2 instances, however, would make this 'fair' by only consuming CPU credits when the CPU did actually burst.
Dedicated Instances and Dedicated Hosts
Dedicated instances are "Amazon EC2 instances that run in a virtual private cloud (VPC) on hardware that's dedicated to a single customer." Basically, your AWS account will be the only account running instances on that host computer.
A Dedicated Host is a "physical server with EC2 instance capacity fully dedicated to your use. Dedicated Hosts allow you to use your existing per-socket, per-core, or per-VM software licenses, including Windows Server, Microsoft SQL Server, SUSE, Linux Enterprise Server, and so on." Basically, you pay for the entire host computer and then launch individually instances on the host (at no additional charge).
The use of a Dedicated Instance or a Dedicated Host has no impact on resources allocated to each instance. They would receive the same resources as when running as a normal Shared Instance.
My company is running into a network performance problem that seemingly has all of the "experts" we're working with (VMWare support, RHEL support, our managed services hosting provider) stumped.
The issue is that network latency between our VMs (even VMs residing on the same physical host) increases--up to 100x or more!--with network throughput. For example, without any network load, latency (measured by ping) might be ~0.1ms. Start transferring a couple 100MB files, and latency grows to 1ms. Initiate a bunch (~20 or so) concurrent data transfers between two VMs, and the latency between the VMs can increase to upwards of 10ms.
This is a huge problem for us because we have application server VMs hosting processes that might issue 1 million or so queries against a database server (different VM) per hour. Adding a millisecond or two to each query therefore increases our runtime substantially--sometimes doubling or tripling our expected durations.
We've got what I would think is a pretty standard environment:
ESXi 6.0u2
4 Dell M620 blades with 2x Xeon E5-2650v2 processors and 128GB RAM
SolidFire SAN
And our base VM configuration consists of:
RHEL7, minimal install
Multiple LUNs configured for mount points at /boot, /, /var/log, /var/log/audit, /home, /tmp and swap
All partitions except /boot encrypted with LUKS (over LVM)
Our database server VMs are running Postgres 9.4.
We've already tried the following:
Change the virtual NIC from VMNETx3 to e1000 and back
Adjust RHEL ethernet stack settings
Using ESXi's "low latency" option for the VMs
Upgrading our hosts and vCenter from ESX 5.5 to 6.0u2
Creating bare-bones VMs (setup as above with LUKS, etc., but without any of our production services on them) for testing
Moving the datastore from the SSD SolidFire SAN to local (on-blade) spinning storage
None of these improved network latency. The only test that showed expected (non-deteriorating) latency is when we set up a second pair of bare-bones VMs without LUKS encryption. Unfortunately, we need fully encrypted partitions (for which we manage the keys) because we are dealing with regulated, sensitive data.
I don't see how LUKS--in and of itself--can be to blame here. Rather, I suspect that LUKS running with some combination of ESX, our hosting hardware, and/or our VM hardware configuration is to blame.
I performed a test in a much wimpier environment (MacBook Pro, i5, 8GB RAM, VMWare Fusion 6.0, Centos7 VMs configured similarly with LUKS on LVM and the same testing scripts) and was unable to reproduce the latency issue. Regardless of how much network traffic I sent between the VMs, latency remained steady at about 0.4ms. And this was on a laptop with a ton of the things going on!
Any pointers/tips/solutions will be greatly appreciated!
After much scrutiny and comparing the non-performing VMs against the performant VMs, we identified the issue as a bad selection for the advanced "Latency Sensitivity" setting.
For our poorly performing VMs, this was set to "Low". After changing the setting to "Normal" and restarting the VMs, latency dropped by ~100x and throughput (which we hadn't originally noticed was also a problem) increased by ~250x!
I've been giving a task to monitor an Amazon ec2 instance's resources/performance. I do not have access to the Amazon Control Panel/Dashboard but I'm allowed to install free software on the ec2 that can track the stats.
I know you need to pay for indpeth/custom charts/graphs in the Amazon Control Panel, is this maybe the best approach for accurate stats or are the preferable free software that can track the following stats.
Total used memory and free memory in x amount time
Total requests made in X amount time
Total CPU usage in x amount time
You may want to use a good, basic monitoring service like New Relic. They have both server and application monitoring available that, together, could give you the stats you list. Your first and third items are more server-centric, while your second bullet is specific to the application you're running (i.e. Apache, NGINX, Postfix, etc.).
Here is a list of other monitoring options.
http://aws.amazon.com/ec2/#pricing
I can't understand this. What is an instance? ("On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments.")
Does this mean that I can use whole as my VMware server:
(Extra Large Instance)
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge
For $0.96 per hour?
Or does it mean only like one operation or something? What is that instance exactly?
An instance signifies an operating system instance (a virtual machine). By using virtualization, Amazon (and cloud providers in general) offer you a virtualized environment where OS instances are running. You have full control over that operating system inside that environment. Per hour means that you pay that much for using your OS instance resources for a single hour. I believe that page has almost all the details about pricing.
An instance is a virtual machine. For example you can start up an ubuntu instance and then you can SSH into it and do whatever you want.