EC2 Instance, sudden network performance cap - amazon-ec2

Anyone else experienced a sudden network performance cap recently?
Our instances managed to go up to average 100,000,000 bytes average but all of a sudden we're down to 50,000,000 without warning. This happened two days ago at around Oct 16 11:40 UTC.
I'm using a c3.xlarge type instance with network performance moderate, did they lower the cap of the "moderate" performance?
Would be nice to hear if anyone else have this problem since its pretty weird that they would do that without warning, I cant find any information on this.
I've attached screenshot of proof, the instance-type was not changed at that time.
Its the same problem on both Network In and Network Out.
Graph:
http://i.stack.imgur.com/WQ9Sf.png

This is par for the course with shared tenancy. Most instances except for the largest instance sizes, are all on hardware shared with other instances. This means all resources are shared including network bandwidth.
When no other instances are using the bandwidth available to your host, you can generally take advantage of most, if not all of it. If other hosts are attempting to saturate the host bandwidth, then host will schedule your bandwidth based on your network priority.
Moderate does not mean you are guaranteed a certain amount of bandwidth, instead it gives you a certain priority in comparison with the other instances on the host.
What can you do about this? You could stop/start your instance until you get assigned to a host without any noisy neighbors. You could also scale horizontally to give yourself more available bandwidth.

Related

Live traffic on port via snmp and discrepancies

Iam trying to get data from HP switches and Juniper firewalls and its port via snmp.
I am looking for the way how to analyze live traffic on port so I can create a graph of utilization of the ports like on Solarwinds or Observium.
So far I have the results I am getting are from the formula on How to calculate traffic on cisco
It works fine, however, every couple of readings I get abnormal speeds. I.e. for a virtual interface on the firewall, which is limited to 4MB I get 20+ MB every now and then.
I have a cron job which polls the devices every 5 minutes so the formula is using 300 seconds as a delta of time.
So the question is, is it possible for a port to be showing these abnormalities or am I doing something wrong? Any insight would be amazing :-)
The problem is that you are using ifTable defined in RFC1213. It is sort of outdated due to ifInOctets and ifOutOctets are defined as 32-bit counters. So they will overflow and reset real fast and you'll face abnormal results when this happens. I'd suggest switching to ifXTable (IF-MIB) where these counters are defined as 64-bit values.

Performance of CPU

While going through Computer organisation by Patterson,I encountered a question where I am completely stuck. Question is:
Suppose we know that an application that uses both a desktop client and a remote server is limited by network performance. For the following changes state whether only the throughput improves, both response time and throughput improve, or neither improves.
And the changes made are:
More memory is added to the computer
If we add more memory ,shouldn't the throughput and execution time will improve?
To be clear ,the definition of throughput and response time is explained in the book as:
Throughput: The amount of work done in a given time.
Response Time: time required to complete a task ,tasks are i/o device activities, Operating System overhead, disk access, memory access.
Assume the desktop client is your internet browser. And the server is the internet, for example the stackoverflow website. If you're having network performance problems, adding more RAM to your computer won't make browsing the internet faster.
More memory helps only when the application needs more memory. For any other limitation, the additional memory will simply remain unused.
You have to think like a text book here. If your only given constraint is network performance, then you have to assume that there are no other constraints.
Therefore, the question boils down to: how does increasing the memory affect network performance?
If you throw in other constraints such as the system is low on memory and actively paging, then maybe response time improves with more memory and less paging. But the only constraint given is network performance.
It wont make a difference as you are already bound by the network performance. Imagine you have a large tank of water and tiny pipe coming out it. Suppose you want to get more water within given amount of time (throughput). Does it make sense to add more water to the tank to achieve that? Its not, as we are bound by the width of the pipe. Either you add more pipes or you widen the pipe you have.
Going back to your question, if the whole system is bound by network performance you need to add more bandwidth, to see any improvement. Doing anything else is pointless.

SAN Performance

Have a question regarding SAN performance specifically EMC VNX SAN. I have a significant number of processes spread over number of blade servers running concurrently. The number of processes is typically around 200. Each process loads 2 small files from storage, one 3KB one 30KB. There are millions (20) of files to be processed. The processes are running on Windows Server on VMWare. The way this was originally setup was 1TB LUNs on the SAN bundled into a single 15TB drive in VMWare and then shared as a network share from one Windows instance to all the processes. The processes running concurrently and the performance is abysmal. Essentially, 200 simultaneous requests are being serviced by the SAN through Windows share at the same time and the SAN is not handling it too well. I'm looking for suggestions to improve performance.
With all performance questions, there's a degree of 'it depends'.
When you're talking about accessing a SAN, there's a chain of potential bottlenecks to unravel. First though, we need to understand what the actual problem is:
Do we have problems with throughput - e.g. sustained transfer, or latency?
It sounds like we're looking at random read IO - which is one of the hardest workloads to service, because predictive caching doesn't work.
So begin at the beginning:
What sort of underlying storage are you using?
Have you fallen into the trap of buying big SATA, configuring it RAID-6? I've seen plenty of places do this because it looks like cheap terabytes, without really doing the sums on the performance. A SATA drive starts to slow down at about 75 IO operations per second. If you've got big drives - 3TB for example - that's 25 IOPs per terabytes. As a rough rule of thumb, 200 per drive for FC/SAS and 1500 for SSD.
are you tiering?
Storage tiering is a clever trick of making a 'sandwich' out of different speeds of disk. This usually works, because usually only a small fraction of a filesystem is 'hot' - so you can put the hot part on fast disk, and the cold part on slow disk, and average performance looks better. This doesn't work for random IO or cold read accesses. Nor does it work for full disk transfers - as only 10% of it (or whatever proportion) can ever be 'fast' and everything else has to go the slow way.
What's your array level contention?
The point of SAN is that you aggregate your performance, such that each user has a higher peak and a lower average, as this reflects most workloads. (When you're working on a document, you need a burst of performance to fetch it, but then barely any until you save it again).
How are you accessing your array?
Typically SAN is accessed using a Fiber Channel network. There's a whole bunch of technical differences with 'real' networks, but they don't matter to you - but contention and bandwidth still do. With ESX in particular, I find there's a tendency to underestimate storage IO needs. (Multiple VMs using a single pair of HBAs means you get contention on the ESX server).
what sort of workload are we dealing with?
One of the other core advantages of storage arrays is caching mechanisms. They generally have very large caches and some clever algorithms to take advantage of workload patterns such as temporal locality and sequential or semi-sequential IO. Write loads are easier to handle for an array, because despite the horrible write penalty of RAID-6, write operations are under a soft time constraint (they can be queued in cache) but read operations are under a hard time constraint (the read cannot complete until the block is fetched).
This means that for true random read, you're basically not able to cache at all, which means you get worst case performance.
Is the problem definitely your array? Sounds like you've a single VM with 15TB presented, and that VM is handling the IO. That's a bottleneck right there. How many IOPs are the VM generating to the ESX server, and what's the contention like there? What's the networking like? How many other VMs are using the same ESX server and might be sources of contention? Is it a pass through LUN, or VMFS datastore with a VMDK?
So - there's a bunch of potential problems, and as such it's hard to roll it back to a single source. All I can give you is some general recommendations to getting good IO performance.
fast disks (they're expensive, but if you need the IO, you need to spend money on it).
Shortest path to storage (don't put a VM in the middle if you can possibly avoid it. For CIFS shares a NAS head may be the best approach).
Try to make your workload cacheable - I know, easier said than done. But with millions of files, if you've got a predictable fetch pattern your array will start prefetching, and it'll got a LOT faster. You may find if you start archiving the files into large 'chunks' you'll gain performance (because the array/client will fetch the whole chunk, and it'll be available for the next client).
Basically the 'lots of small random IO operations' especially on slow disks is really the worst case for storage, because none of the clever tricks for optimization work.

What are the theoretical performance limits on web servers?

In a currently deployed web server, what are the typical limits on its performance?
I believe a meaningful answer would be one of 100, 1,000, 10,000, 100,000 or 1,000,000 requests/second, but which is true today? Which was true 5 years ago? Which might we expect in 5 years? (ie, how do trends in bandwidth, disk performance, CPU performance, etc. impact the answer)
If it is material, the fact that HTTP over TCP is the access protocol should be considered. OS, server language, and filesystem effects should be assumed to be best-of-breed.
Assume that the disk contains many small unique files that are statically served. I'm intending to eliminate the effect of memory caches, and that CPU time is mainly used to assemble the network/protocol information. These assumptions are intended to bias the answer towards 'worst case' estimates where a request requires some bandwidth, some cpu time and a disk access.
I'm only looking for something accurate to an order of magnitude or so.
Read http://www.kegel.com/c10k.html. You might also read StackOverflow questions tagged 'c10k'. C10K stands for 10'000 simultaneous clients.
Long story short -- principally, the limit is neither bandwidth, nor CPU. It's concurrency.
Six years ago, I saw an 8-proc Windows Server 2003 box serve 100,000 requests per second for static content. That box had 8 Gigabit Ethernet cards, each on a separate subnet. The limiting factor there was network bandwidth. There's no way you could serve that much content over the Internet, even with a truly enormous pipe.
In practice, for purely static content, even a modest box can saturate a network connection.
For dynamic content, there's no easy answer. It could be CPU utilization, disk I/O, backend database latency, not enough worker threads, too much context switching, ...
You have to measure your application to find out where your bottlenecks lie. It might be in the framework, it might be in your application logic. It probably changes as your workload changes.
I think it really depends on what you are serving.
If you're serving web applications that dynamically render html, CPU is what is consumed most.
If you are serving up a relatively small number of static items lots and lots of times, you'll probably run into bandwidth issues (since the static files themselves will probably find themselves in memory)
If you're serving up a large number of static items, you may run into disk limits first (seeking and reading files)
If you are not able to cache your files in memory, then disk seek times will likely be the limiting factor and limit your performance to less than 1000 requests/second. This might improve when using solid state disks.
100, 1,000, 10,000, 100,000 or 1,000,000 requests/second, but which is true today?
This test was done on a modest i3 laptop, but it reviewed Varnish, ATS (Apache Traffic Server), Nginx, Lighttpd, etc.
http://nbonvin.wordpress.com/2011/03/24/serving-small-static-files-which-server-to-use/
The interesting point is that using a high-end 8-core server gives a very little boost to most of them (Apache, Cherokee, Litespeed, Lighttpd, Nginx, G-WAN):
http://www.rootusers.com/web-server-performance-benchmark/
As the tests were done on localhost to avoid hitting the network as a bottleneck, the problem is in the kernel which does not scale - unless you tune its options.
So, to answer your question, the progress margin is in the way servers process IO.
They will have to use better data structures (wait-free).
I think there are too many variables here to answer your question.
What processor, what speed, what cache, what chipset, what disk interface, what spindle speed, what network card, how configured, the list is huge. I think you need to approach the problem from the other side...
"This is what I want to do and achieve, what do I need to do it?"
OS, server language, and filesystem effects are the variables here. If you take them out, then you're left with a no-overhead TCP socket.
At that point it's not really a question of performance of the server, but of the network. With a no-overhead TCP socket your limit that you will hit will most likely be at the firewall or your network switches with how many connections can be handled concurrently.
In any web application that uses a database you also open up a whole new range of optimisation needs.
indexes, query optimisation etc
For static files, does your application cache them in memory?
etc, etc, etc
This will depend what is your CPU core
What speed are your disks
What is a 'fat' 'medium' sized hosting companies pipe.
What is the web server?
The question is too general
Deploy you server test it using tools like http://jmeter.apache.org/ and see how you get on.

How can I estimate ethernet performance?

I need to think about performance limitations of 100 mbps ethernet (including scenarios with up to ~100 endpoints on the same subnet) and I'm wondering how best to go about estimating the capacity of the network. Are there any rules of thumb for this?
The reason I ask is that I am working on some back-of-the-envelope level calculations about performance limitations, so it doesn't need to be incredibly accurate. I just haven't been through this exercise before and was hoping to gain some insight from those who have. Mark Brackett's answer (as of 1/26) is along the lines of what I am looking for.
If you're using switches (and, honestly, who isn't these days) - then I've found 80% of capacity a reasonable estimate. Usually, it's really about 90% because of TCP overhead - but 80% accounts for occasional retransmits.
If it's a single collision domain (hubs), then you'd probably be around 30% with moderate activity on those 100 nodes. But, it'd be pretty variable based on the traffic generated. And anyone putting 100 nodes in a single CD these days would no doubt be shot - so I don't think you'll actually run into those IRL.
Edit: Note that these numbers are for a relatively healthy network - one that is generally defined as working. Extremely excessive broadcasts or other anomalous traffic patterns have been known to bring a network to it's knees.
Use WANem
WANem is a Wide Area Network Emulator,
meant to provide a real experience of
a Wide Area Network/Internet, during
application development / testing over
a LAN environment.
You can simulate any network scenario using it and then test your application's behaviour using it. It is open-source and is available with sourceforge.
Link : WANem - The Wide Area Network emulator
Opnet creates software for simulating network performance. I once used Opnet IT Guru Academic edition. Maybe this application or some other software from opnet may be of some help.
100 endpoints are not suppose to be an issue. If the network is properly configured (nothing special) the only issue is the bandwidth. Fast Ethernet (100 mbps) should be able to transfer almost 10Mb (bytes) per second. It is able to transfer it to one client or to many. If you are using hubs instead of switches. And if you are using half-duplex instead of full-duplex. Then you should change that( this is the rule of thumb).
Working from the title of your post, "How can I estimate Ethernet performance", see this wiki link; http://en.wikipedia.org/wiki/Ethernet_frame#Maximum_throughput

Resources