AWS Elastic Load Balancer and multiple availability zones - amazon-ec2

I want to understand how ELB load balances between multiple availability zones. For example, if I have 4 instances (a1, a2, a3, a4) in zone us-east-1a and a single instance d1 in us-east-1d behind an ELB, how is the traffic distributed between the two availability zones? i.e., would d1 get nearly 50% of all the traffic or 1/5th of the traffic?

If you enable ELB Cross-Zone Load Balancing, d1 will get 20% of the traffic.
Here's what happen without enabling Cross-Zone Load Balancing:
D1 would get nearly 50% of the traffic. This is why Amazon recommends adding the same amount of instances from each AZ to your ELB.
The following excerpt is extracted from Overview of Elastic Load Balancing:
Incoming traffic is load balanced equally across all Availability Zones enabled for your load balancer, so it is important to have approximately equivalent numbers of instances in each zone. For example, if you have ten instances in Availability Zone us-east-1a and two instances in us-east-1b, the traffic will still be equally distributed between the two Availability Zones. As a result, the two instances in us-east-1b will have to serve the same amount of traffic as the ten instances in us-east-1a. As a best practice, we recommend you keep an equivalent or nearly equivalent number of instances in each of your Availability Zones. So in the example, rather than having ten instances in us-east-1a and two in us-east-1b, you could distribute your instances so that you have six instances in each Availability Zone.

The load balancing between different availability zones is done via DNS. When a DNS resolver on the client asks for the IP address of the ELB, it gets two addresses. And chooses to use one of them (usually the first). The DNS server usually responds with a random order, so the first IP is not used at all times but each IP is used only part of the time (half for 2, third of the time for 3, etc ...).
Then behind these IP addresses you have an ELB server in each availability zone that has your instances connected to it. This is the reason why a zone with just a single instance will get the same amount of traffic as all the instances in another zone.
When you get to the point that you have a very large number of instances, ELB can decide to create two such servers in a single availability zone, but in this case it will split your instances for it to have half (or some other equal division) of your instances.

Related

10GB connectivity between Amazon Instances

I am trying to architect a solution for Amazon EC2 that requires high network bandwidth. Is there a way to provision 10GbE connectivity between Amazon ec2 instances to get high network bandwidth?
Certain Amazon EC2 instances types launched into the same cluster placement group are placed into a non-blocking 10 Gigabit ethernet network.
These instance types include:
m4.10xlarge
c4.8xlarge
c3.8xlarge
g2.8xlarge
r3.8xlarge
d2.8xlarge
i2.8xlarge
cc2.8xlarge
cc1.4xlarge
cr1.8xlarge
Just look in the Network Performance column in the EC2 launch console and you'll see it says "10 Gigabit".
From the Placement Groups documentation:
A placement group is a logical grouping of instances within a single Availability Zone. Using placement groups enables applications to participate in a low-latency, 10 Gbps network. Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both.
To provide the lowest latency, and the highest packet-per-second network performance for your placement group, choose an instance type that supports enhanced networking.
The following instances support enhanced networking: C3, C4, D2, I2, M4, R3

monitor incoming http requests to a website with a loadbalancer

I am stuck with the problem of monitoring http requests of a website with an internet-facing loadbalancer. To be specific, I have hosted a website that uses a server farm of AWS EC2 instances with a loadbalancer (ELB) at the front. Now I want to get an idea about the request arrival rate per second (or per minute) to scale the server farm.
I have thought of an approach to perform this task online. The idea is to get the ELB log each minute and parsing it for http request count for the last minute. Just wondering whether there is any efficient way to do it online.
Any help would be highly appreciated.
Your best bet is to use AWS's cloudwatch to do the monitoring for you:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html
Elastic Load Balancing publishes data points to Amazon CloudWatch
about your load balancers and your back-end application instances.
CloudWatch allows you to retrieve statistics about those data points
as an ordered set of time-series data, known as metrics. Think of a
metric as a variable to monitor, and the data points represent the
values of that variable over time. Each data point has an associated
time stamp and (optionally) a unit of measurement. For example, total
number of healthy EC2 instances behind a load balancer over a
specified time period can be a metric.
Amazon CloudWatch provides statistics based on the metric data points
published by Elastic Load Balancing. Statistics are metric data
aggregations over specified periods of time. The following statistics
are available: Minimum (min), Maximum (max), Sum, Average, and Count.
When you request statistics, the returned data stream is identified by
the metric name and a dimension. A dimension is a name/value pair that
helps you to uniquely identify a metric. For example, you can request
statistics of all the healthy EC2 instances behind a load balancer
launched in a specific Availability Zone.

How do I use ELB's HealthyHostCount for monitoring in CloudWatch?

We have three EC2 instances—one in each availability zone (AZ) in the eu-west-1 region. They are loadbalanced using ELB. We'd like to monitor how many instances are registered at the loadbalancer, using CloudWatch. The problem ist: I don't really understand the HealthyHostCount metric.
For a deployment, we'd like to be able to de-register a single instance (take it out of the LB) without being notified. So the alarm would be: Notify if there is only 1 healthy instance left behind the loadbalancer for 5 minutes.
As far as I understand, HealthyHostCount (HHC) is the number of healthy instances that are registered with a given ELB, averaged over all AZs. If everything is okay, the HHC should be 1 (no matter over what period of time) because there is 1 instance in each AZ.
A couple of days ago, someone deployed without re-registering the instances, so there was only 1 instance being balanced. When we noticed that, we created an alarm that was to notify us when the average HHC sunk below 0.6 after 5 minutes. (If only 1 instance is registered in ELB, the HHC should average 0.33 for any period of time.) However, the alarm never changed to state "ALARM."
When I checked the HHC in CloudWatch, the HHC were numbers that didn't make sense (sum of 10.0 for a 5-minute interval is all I remember now).
It's all a big mess to me. Any time I think I understand the metric, the CloudWatch charts are all gibberish to me.
Could someone please explain how to use HHC to get an alarm when only 1 instance is registered? Is average HHC the way to go or should I use another metric?
The HealthyHostCount metric records one data value with the count of available hosts for each availability zone, each time a health check is executed. Your ELB health check has an Interval parameter that defines how many health checks are executed per minute.
If you are watching a Per-AZ metric, with a health check Interval of 10 seconds, with 2 healthy hosts in that AZ, you will see 6 data points per minute (60/10) with a value of 2. The average, max and min will be 2, but the sum will be 6*2=12.
If you have 3 AZs with 2 hosts each, again with an Interval=10, but you are looking at the Per-LB metric, you will see 3*6=18 data points per minute, each one with a value of 2. The average, max and min will be 2, but the sum will be 18*2=36
I recommend you to set-up an interval value that can divide 60 seconds (either 5, 6, 10, 15, 20, 30 or 60 seconds).
In your case, if your interval is 30 seconds, and you have 3 AZs and 1 server per AZ: You should expect 2 data points per AZ per minute, so set-up an alarm Per-LB, with a Period of 1 minute, for Sum of HealthyHostCount that triggers when value is LowerOrEqual than 2 (2 data values * 1 Healthy AZ * 1 healthy server = 2, the other 4 data values of the unhealthy AZs should be 0 so they won't affect the sum).
UPDATE:
It turns out that the number of health check executed also depends on the number of internal instances that shapes the ELB (ussually one per AZ), so if you are suffering a traffic spike, or enough load to saturate a single elb-internal-instance, the amount of internal servers inside the ELB will grow and you will have more data points unexpectedly. This may affect the sum value, only if you have lots of traffic. I didn't saw this issue with a peak load of 6k RPM distributed in 3 AZs. If this is your scenario, then using average is a safer bet, but I would recommend that you use LowerThan 0.65 as your threshold.
The link also makes me wonder how does the Cross-Zone Load Balancing feature affects the amount of data points...
This is an area where the CloudWatch web console doesn't expose everything that cloud watch can do. As the docs explain, HealthyHostCount is a per availability zone metric. The console lets you have HealthHostCount by availability zone (but across all load balancers) or by load balancer (but across all zones) but not sliced both ways.
If you only have one load balancer the simplest thing would be to setup one alarm on each of the per zone metrics. If you have multiple availability zones then you should be able to use the api to create an alarm slicing across availability zone and load balancer (again, one alarm per load balancer) but you can't do this from the web UI as far as I know.

AWS Autoscaling and Elastic load balancing

For my application I am using auto scaling, without using elastic load balancing, is there any performance issue for directly using Auto scaling without ELB?
Adi,
David is right.
Autoscaling allows you to scale instances (based on cloudwatch metrics, a single event, or on a recurring schedule).
Suppose you have three instances running (scaled with Autoscaling): how is traffic going to reach them? You need to implement a Load Balancing somewhere, that's why Elastic Load Balancing is so useful.
Without that, your traffic can only be directed in a poorly-engineered manner.
See Slide #5 of this presentation on slideshare, to get a sense of the architecture: http://www.slideshare.net/harishganesan/scale-new-business-peaks-with-auto-scaling
Best,
Autoscaling determines, based on some measurement (CPU load is a common measurement), whether or not to increase/decrease the number of instances running.
Load balancing relates to how you distribute traffic to your instances based on domain name lookup, etc. Somewhere you must have knowledge of which IP addresses are those currently assigned to the instances that the autoscaling creates.
You can have multiple IP address entries for A records in the DNS settings and machines will be allocated in a roughly round-robin fashion from that pool. But, keeping the pool up to date in real-time is hard.
The load balancer gives you an easy mechanism to provide a single interface/IP address to the outside world and it has knowledge of which instances it is load balancing in real time.
If you are using autoscaling, unless you are going to create a fairly complex monitoring and DNS updating system, you can reasonably assume that you must use a load balancer as well.

Load balancing: DNS round robin in front of hardware load balancers. How to share stickiness?

DNS Round Robin (DRR) permits to do cheap load balancing (distribution is a better term). It has the pro of permitting infinite horizontal scaling. The con is that if one of the web servers goes down, some clients continue to use the broken IP for minutes (min TTL 300s) or more, even if the DNS implements fail-over.
An Hardware Load Balancer (HLB) handles such web server failures transparently but it cannot scale its bandwidth indefinitely. An hot spare is also needed.
A good solution seems to use DRR in front to a group of HLB pairs. Each HLB pair never goes down and therefore DRR never keeps clients down. Plus, when bandwidth isn't enough you can add a new HLB pair to the group.
Problem: DRR moves clients randomly between the HLB pairs and therefore (AFAIK) session stickiness cannot work.
I could just avoid to use session stickiness but it makes better use of caches therefore is something that I want to preserve.
Question: is it possible/exist an HLB implementation where an instance can share its (sessionid,webserver) mapping with other instances?
If this is possible then a client would be routed to the same web server independently by the HLB that routed the request.
Thanks in advance.
Modern load balancers have very high throughput capabilities (gigabit). So unless you're running a huuuuuuuuuuge site (e.g. google), adding bandwidth is not why you'll need a new pair of load balancers, especially since most large sites offload much of their bandwidth to CDNs (Content Delivery Networks) like Akamai. If you're pumping a gigabit of un-CDN-able data through your site and don't already have a global load-balancing strategy, you've got bigger problems than cache affinity. :-)
Instead of bandwidth limits, sites tend to add additional LB pairs for geo-distribution of servers at separate data centers to ensure users spread across the world can talk to a server closest to them.
For that latter scenario, load balancer companies offer geo-location solutions, which (at least until a few years ago which was when I was following this stuff) were based on custom DNS implementations which looked at client IPs and resolved to the load balancer pairs Virtual IP address which is "closest" (in network topology or performance) to the client. These days, CDNs like Akamai also offer global load balancing services (e.g. http://www.akamai.com/html/technology/products/gtm.html). Amazon's EC2 hosting also supports this kind of feature for sites hosted there (see http://aws.amazon.com/elasticloadbalancing/).
Since users tend not to move across continents in the course of a single session, you automatically get affinity (aka "stickiness") with geographic load balancing, assuming your pairs are located in separate data centers.
Keep in mind that geo-location is really hard since you also have to geo-locate your data to ensure your back-end cross-data-center network doesn't get swamped.
I suspect that F5 and other vendors also offer single-datacenter solutions which achieve the same ends, if you're really concerned about the single point of failure of network infrastructure (routers, etc.) inside your datacenter. But router and switch vendors have high-availability solutions which may be more appropriate to address that issue.
Net-net, if I were you I wouldn't worry about multiple pairs of load balancers. Get one pair and, unless you have a lot of money and engineering time to burn, partner with a hoster who's good at keeping their data center network up and running.
That said, if cache affinity is such a big deal for your app that you're thinking about shelling out big $$$ for multiple pairs of load balancers, it may be worth considering some app architecture changes (like using an external caching cluster). Solutions like memcached (for linux) are designed for this scenario. Microsoft also has one coming called "Velocity".
Anyway, hope this is useful info-- it's admittedly been a while since I've been deeply involved in this space (I was part of the team which designed an application load balancing product for a large software vendor) so you might want to double-check my assumptions above with facts you can pull off the web from F5 and other LB vendors.
Ok, this is an ancient question, which I just found through a Google search. But for any future visitors, here is some additional clarifications:
Problem: [DNS Round Robin] moves clients randomly between the HLB pairs and therefore (AFAIK) session stickiness cannot work.
This premise is as best I can tell not accurate. It seems nobody really knows what old browsers might do, but presumably each browser window will stay on the same IP address as long as it's open. Newer operation systems probably obey the "match longest prefix" rule. Thus there shouldn't be much 'flapping', randomly switching from one load balancer IP to another.
However, if you're still worried about users getting randomly reassigned to a new load balancer pair, then a small modification of the classic L3/4 & L7 load balancing setup can help:
Publish DNS Round Robin records that go to Virtual high-availability IPs that are handled by L4 load balancers.
Have the L4 load balancers forward to pairs of L7 load balancers based on the origin IP address, i.e. use consistent hashing based on the end users IP to always route end users to the same L7 load balancer.
Have your L7 load balancers use "sticky sessions" as you want them to.
Essentially this is just a small modification to what Willy Tarreau (the creator of HAProxy) wrote years ago.
thanks for having put things in the right perspective.
I agree with you.
I did some reading and found:
Flickr: http://highscalability.com/flickr-architecture
4 billion queries per day --> about 50000 queries/s
Youtube: http://highscalability.com/youtube-architecture
100 million video views/day --> about 1200 video views/second
PlentyOfFish: http://highscalability.com/plentyoffish-architecture
600 pages/second
200 Mbps used
CDN used
Twitter: http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster
300 tweets/second
600 req/s
A very top end LB like this can scale up :
200,000 SSL handshakes per second
1 million TCP connections per second
3.2 million HTTP requests per second
36 Gbps of TCP or HTTP throughput
Therefore, you are right a LB could hardly become a bottleneck.
Anyway I found this (old) article http://www.tenereillo.com/GSLBPageOfShame.htm
where it is explained that geo-aware DNS could create availability issues.
Could someone comment on that article?
Thanks,
Valentino
So why not keep it simple and have the DNS server give out a certain IP address (or addresses) based on the origin IP address (i.e. use consistent hashing based on the end users IP to always give end users the same IP address(es)) ?
I'm aware that this only provides a simple and cheap load distribution mechanism.
I have been looking for this, but haven't found a DNS server which implements this (although Bind has some possibilities with views).

Resources