I am trying to architect a solution for Amazon EC2 that requires high network bandwidth. Is there a way to provision 10GbE connectivity between Amazon ec2 instances to get high network bandwidth?
Certain Amazon EC2 instances types launched into the same cluster placement group are placed into a non-blocking 10 Gigabit ethernet network.
These instance types include:
m4.10xlarge
c4.8xlarge
c3.8xlarge
g2.8xlarge
r3.8xlarge
d2.8xlarge
i2.8xlarge
cc2.8xlarge
cc1.4xlarge
cr1.8xlarge
Just look in the Network Performance column in the EC2 launch console and you'll see it says "10 Gigabit".
From the Placement Groups documentation:
A placement group is a logical grouping of instances within a single Availability Zone. Using placement groups enables applications to participate in a low-latency, 10 Gbps network. Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both.
To provide the lowest latency, and the highest packet-per-second network performance for your placement group, choose an instance type that supports enhanced networking.
The following instances support enhanced networking: C3, C4, D2, I2, M4, R3
Related
Good morning everybody,
I am really trying to understand about AWS Network Performance.
I did a lot of tests and my conclusion is that the speed really doesn't correspond to what it should be as mentioned by AWS amazon.
Well, I created 9 different EC2 on this test. All these instances were created on the same zone (North Virginia), all running on Ubuntu 18.04. ENA: Yes. AMI - Ubuntu 18.04
Measurement: Speedtest cli
1st instance: t2.micro (low to moderate) > Down: 488.86Mbps/ Up: 946.49Mbps
2st instance: t2.large (low to moderate) > Down: 940.31Mbps/ Up: 900.47Mbps
3rd instance: t2.xlarge (moderate) > Down: 964.90Mbps/ Up: 942.10Mbps
4th instance: t3.micro (Up to 5 Gigabit) > Down: 4002.90Mbps/ Up: 2844.04Mbps
5th instance: t3a.nano (Up to 5 Gigabit) > Down: 1584.52Mbps/ Up: 1451.76Mbps
6th instance: t3a.small (Up to 5 Gigabit) > Down: 2758.09Mbps/ Up: 2143.01Mbps
7th instance: t3.2xlarge (Up to 5 Gigabit) > Down: 2936.57/ Up: 2733.28 Mbps
8th instance: m5a.large (Up to 10 Gigabit) > Down: 3178.89/ Up: 2291.97Mbps
9th instance: r5n.large (Up to 25 Gigabit) > Down: 4361.71/ Up: 3383.65Mbps
These measurement results:
Instances 1-2
Instances 3-4
Instances 5-6
Instances 7-8
Instance 9
Having these results, we can conclude:
Network Performance:
Low to Moderate: The best we can get is almost 1Gigabit. (Down and Up) Very good speed for low to moderate.
Moderate: Keep about almost 1Gigabit network performance. That's fine
Up to 5 Gigabit: We can get down/up starting on about 1.5 Gigabit. The fastest I measured was down/up 4002Mbps/2844Mbps. Great, the network performance on "up to 5 gigabit" is corresponding perfectly.
Up to 10 Gigabit: Download and upload the same as an instance running "up to 5 gigabit". It's not corresponding to what it should be.. 9 gigabit at least should be fine!
Up to 25 Gigabit: I can't believe. Download and upload network performance practically the SAME as "up to 5 gigabit". It should be at least 20-22 Gigabit.
I'm almost sure that "25 Gigabit, 50 Gigabit, 75 Gigabit, 100 Gigabit will happen the same. It won't connect more than 4 Gigabit.
Could anyone explains what is going on? Is that normal? Do I need to make any settings to enhance network performance?
Hope to hear from the community,
Thanks a lot!!
Mat
There appear to be several potential ssues at play, here.
Perhaps the most noteworthy is that the performance numbers in the specs are not Internet bandwidth, they're Ethernet bandwidth.
Network traffic to the Internet is limited to 5 Gbps (full duplex).
https://aws.amazon.com/ec2/instance-types/
You may counter that your destination is not "The Internet" because your test target is a system located in the same AWS region, but the traffic is still going through your VPC's Internet Gateway.
The next consideration is that the maximum performance potential for EC2 applies only within the same VPC, within a placement group, using private IP addresses, with multiple simultaneous traffic flows.
Traffic to and from EC2 instances in the same or different Availability Zones within a region can now take advantage of up to 5 Gbps of bandwidth for single-flow traffic, or 25 Gbps of bandwidth for multi-flow traffic (a flow represents a single, point-to-point network connection) by using private IPv4 or IPv6 addresses as described here.
https://aws.amazon.com/blogs/aws/the-floodgates-are-open-increased-network-bandwidth-for-ec2-instances/
Before you can use a tool like this for benchmarking, you also need to be able to verify that the tool would, e.g., actually be able to accurately test the range of capabilities in question.
Speed test tools require a server on the other end that is capable of generating enough data to saturate the connection, and you presumably don't have a way of verifying that, though you might inquire about the tester's actual capabilities with Ookla.
You also need to verify that the tool isn't exhausting some other resource on the server, such as CPU, which might not manifest itself as 100% CPU utilization because the number of saturated cores won't exceed the number of threads running.
See https://aws.amazon.com/premiumsupport/knowledge-center/network-throughput-benchmark-linux-ec2/ for bemchmarking guidelines, including using iperf on two machines to test the bandwidth between them.
It very much depends on how your tool is testing speed.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html
Bandwidth for single-flow (5-tuple) traffic is limited to 5 Gbps when instances are not in the same cluster placement group.
I’m not sure exactly how the Speedtest CLI is implemented, but it’s not uncommon for tools like that to utilize a single flow. The following documentation indicates that at least the destination port is held constant during the test, and we know that the source and destination IP addresses will remain constant, as will the protocol (tcp). So if the source port isn’t varied, that’s a single 5-tuple, which is going to run into the 5Gpbs limit.
https://help.speedtest.net/hc/en-us/articles/360038679354-How-does-Speedtest-measure-my-network-speeds-
The client establishes multiple connections with the server over port: 8080.
What you’d want to do is pick a traffic benchmarking tool that allows for multiple flows, or verify that Speedtest cli is actually opening up multiple source ports while it’s running. I’d recommend iperf; make sure to use the latest version.
The big takeaway here is that advertised bandwidth numbers are generally describing a best-case usage pattern, where the traffic being transmitted consists of large packets on many flows. Under the hood, there are systems that need to process this traffic, and generally each flow will be sticky to one processing core of that system/systems. The bulk of the work that they need to do is fixed per packet, so large packets mean more bandwidth utilized for less real work. And multiple flows means distribution over multiple processing cores, instead of bottlenecking on one.
Our current cassandra cluster is 26 nodes spread across four AWS EC2 regions. We use elastic IP's for all of our nodes, and we use security groups to allow all nodes to talk to each other.
There is a hard limit of 250 security group rules per network interface (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html#vpc-limits-security-groups). From the documentation:
You can have 50 inbound and 50 outbound rules per security group (giving a total of 100 combined inbound and outbound rules). If you need to increase or decrease this limit, you can contact AWS Support — a limit change applies to both inbound and outbound rules. However, the multiple of the limit for inbound or outbound rules per security group and the limit for security groups per network interface cannot exceed 250
Since each node needs a security group entry for every other node in the cluster, that means we have a hard limit of a 250 node cluster.
I can think of some ways to mitigate the issue, like using two security groups, where one allows access from the 'local' nodes in the same region, and the other security group has the elastic IP's for the 'remote' nodes in other regions. This would help, but only a little bit.
I have been in touch with AWS technical support about this, and they suggested using contiguous blocks of elastic IP's (which I did not know was possible). This seems like it would solve the problem, but it turns out this is an involved process, and requires us (my company) to become the ARIN owner of these IP's. The AWS reps are pushing me towards alternatives, such as DynamoDB. I am open to other technologies to solve our problem, but I get the feeling they are just trying to get us to use AWS managed services. On top of that they are rather slow in getting back to me when I ask questions like "Is this typically how customers run multi region cassandra clusters in ec2?"
Does anyone have experience with this approach?
What are some alternatives to using security groups? Does anyone have experience running a large (>250 node), multi region cassandra cluster in EC2? Or even a cluster with >50 nodes (at which point a single security group isn't feasible any more)?
i am new to AWS.
I have some questions want to know.
my EC2 instance :
Instance type : t2.micro - windows server
EC2 region : Asia Pacific (Tokyo)
S3 region : AWS write "S3 does not require region selection."
User location : Taiwan
My EC2 ping is too high to my real-time game and S3 download sometimes is very slow.
My network type is WiFi and Cellular networks(3G&4G).
I have test with my local server, 5 users connected and all work fine.
Average 80kb/s per user for EC2.
Questions:
1.Why my client ping EC2, the time always over 100ms?
2.How can i reduce ping under 100ms?
3.Why S3 download speed is very unstable, 50k~5mb?
4.How can i keep download speed stable?
Regarding S3:
It DOES matter where you create your S3 bucket:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html
"Amazon S3 creates bucket in a region you specify. You can choose any AWS region that is geographically close to you to optimize latency, minimize costs, or address regulatory requirements. For example, if you reside in Europe, you might find it advantageous to create buckets in the EU (Ireland) region."
On the other hand, "S3 does not require region selection." Means you can access any bucket from any region, though access is optimized for the region where the bucket is located. Bucket location can be set only at the time they are created
Regarding unstable times:
t2.micro is not designed to provide a stable capacity. Instead, its capacity is throttled. Actually any instance type starting with t2 has throttled (non constant) CPU capacity. I suggest you try with a larger instance, like m1.small .
See http://www.ec2instances.info/ , any instance that says "burstable" has a non-stable performance.
I want to understand how ELB load balances between multiple availability zones. For example, if I have 4 instances (a1, a2, a3, a4) in zone us-east-1a and a single instance d1 in us-east-1d behind an ELB, how is the traffic distributed between the two availability zones? i.e., would d1 get nearly 50% of all the traffic or 1/5th of the traffic?
If you enable ELB Cross-Zone Load Balancing, d1 will get 20% of the traffic.
Here's what happen without enabling Cross-Zone Load Balancing:
D1 would get nearly 50% of the traffic. This is why Amazon recommends adding the same amount of instances from each AZ to your ELB.
The following excerpt is extracted from Overview of Elastic Load Balancing:
Incoming traffic is load balanced equally across all Availability Zones enabled for your load balancer, so it is important to have approximately equivalent numbers of instances in each zone. For example, if you have ten instances in Availability Zone us-east-1a and two instances in us-east-1b, the traffic will still be equally distributed between the two Availability Zones. As a result, the two instances in us-east-1b will have to serve the same amount of traffic as the ten instances in us-east-1a. As a best practice, we recommend you keep an equivalent or nearly equivalent number of instances in each of your Availability Zones. So in the example, rather than having ten instances in us-east-1a and two in us-east-1b, you could distribute your instances so that you have six instances in each Availability Zone.
The load balancing between different availability zones is done via DNS. When a DNS resolver on the client asks for the IP address of the ELB, it gets two addresses. And chooses to use one of them (usually the first). The DNS server usually responds with a random order, so the first IP is not used at all times but each IP is used only part of the time (half for 2, third of the time for 3, etc ...).
Then behind these IP addresses you have an ELB server in each availability zone that has your instances connected to it. This is the reason why a zone with just a single instance will get the same amount of traffic as all the instances in another zone.
When you get to the point that you have a very large number of instances, ELB can decide to create two such servers in a single availability zone, but in this case it will split your instances for it to have half (or some other equal division) of your instances.
For my application I am using auto scaling, without using elastic load balancing, is there any performance issue for directly using Auto scaling without ELB?
Adi,
David is right.
Autoscaling allows you to scale instances (based on cloudwatch metrics, a single event, or on a recurring schedule).
Suppose you have three instances running (scaled with Autoscaling): how is traffic going to reach them? You need to implement a Load Balancing somewhere, that's why Elastic Load Balancing is so useful.
Without that, your traffic can only be directed in a poorly-engineered manner.
See Slide #5 of this presentation on slideshare, to get a sense of the architecture: http://www.slideshare.net/harishganesan/scale-new-business-peaks-with-auto-scaling
Best,
Autoscaling determines, based on some measurement (CPU load is a common measurement), whether or not to increase/decrease the number of instances running.
Load balancing relates to how you distribute traffic to your instances based on domain name lookup, etc. Somewhere you must have knowledge of which IP addresses are those currently assigned to the instances that the autoscaling creates.
You can have multiple IP address entries for A records in the DNS settings and machines will be allocated in a roughly round-robin fashion from that pool. But, keeping the pool up to date in real-time is hard.
The load balancer gives you an easy mechanism to provide a single interface/IP address to the outside world and it has knowledge of which instances it is load balancing in real time.
If you are using autoscaling, unless you are going to create a fairly complex monitoring and DNS updating system, you can reasonably assume that you must use a load balancer as well.