AWS Autoscaling and Elastic load balancing - amazon-ec2

For my application I am using auto scaling, without using elastic load balancing, is there any performance issue for directly using Auto scaling without ELB?

Adi,
David is right.
Autoscaling allows you to scale instances (based on cloudwatch metrics, a single event, or on a recurring schedule).
Suppose you have three instances running (scaled with Autoscaling): how is traffic going to reach them? You need to implement a Load Balancing somewhere, that's why Elastic Load Balancing is so useful.
Without that, your traffic can only be directed in a poorly-engineered manner.
See Slide #5 of this presentation on slideshare, to get a sense of the architecture: http://www.slideshare.net/harishganesan/scale-new-business-peaks-with-auto-scaling
Best,

Autoscaling determines, based on some measurement (CPU load is a common measurement), whether or not to increase/decrease the number of instances running.
Load balancing relates to how you distribute traffic to your instances based on domain name lookup, etc. Somewhere you must have knowledge of which IP addresses are those currently assigned to the instances that the autoscaling creates.
You can have multiple IP address entries for A records in the DNS settings and machines will be allocated in a roughly round-robin fashion from that pool. But, keeping the pool up to date in real-time is hard.
The load balancer gives you an easy mechanism to provide a single interface/IP address to the outside world and it has knowledge of which instances it is load balancing in real time.
If you are using autoscaling, unless you are going to create a fairly complex monitoring and DNS updating system, you can reasonably assume that you must use a load balancer as well.

Related

nifi ingestion with 10,000+ sensor data?

I am planning to use nifi to ingest data from more than 10,000 sensors. There are 50-100 types of sensors which will send a specific metric to nifi.
I am pondering over whether I should assign 1 port number to listen to all the sensors or I should assign 1 port for each type of sensor to facilitate my data pipeline. which is the better option?
Is there a upper limit of the no of ports which I can "listen" using nifi?
#ilovetolearn
NiFi is such a powerful tool. You can do either of your ideas, but I would recommend to do what is easier for you. If you have data source sensors that need different data flows, use different ports. However, if you can fire everything at a single port, I would do this. This makes it easier to implement, consistent, easier to support later, and easier to scale.
In large scale highly available NiFi, you may want a Load Balancer to handle the inbound data. This would push the sensor data toward a single host:port on the LB appliance, that then directs to NiFi with 3-5-10+ nodes.
I agree with the other answer that once scaling comes into play, an external load balancer in front of NiFi would be helpful.
In regards to the flow design, I would suggest using a single exposed port to ingest all the data, and then use RouteOnAttribute or RouteOnContent processors to direct specific sensor inputs into different flow segments.
One of the strengths of NiFi is the generic nature of flows given sufficient parameterization, so taking advantage of flowfile attributes to handle different data types dynamically scales and performs better than duplicating a lot of flow segments to statically handle slightly differing data.
The performance overhead to run multiple ingestion ports vs. a single port and routed flowfiles is substantial, so this will give you a large performance improvement. You can also organize your flow segments into hierarchical nested groups using the Process Group features, to keep different flow segments cleanly organized and enforce access controls as well.
2020-06-02 Edit to answer questions in comments
Yes, you would have a lot of relationships coming out of the initial RouteOnAttribute processor at the ingestion port. However, you can segment these (route all flowfiles with X attribute in "family" X here, Y here, etc.) and send each to a different process group which encapsulates more specific logic.
Think of it like a physical network: at a large organization, you don't buy 1000 external network connections and hook each individual user's machine directly to the internet. Instead, you obtain one (plus redundancy/backup) large connection to the internet and use a router internally to direct the traffic to the appropriate endpoint. This has management benefits as well as cost, scalability, etc.
The overhead of multiple ingestion ports is that you have additional network requirements (S2S is very efficient when communicating, but there is overhead on a connection basis), multiple ports to be opened and monitored, and CPU to schedule & run each port's ingestion logic.
I've observed this pattern in practice at scale in multinational commercial and government organizations, and the performance improvement was significant when switching to a "single port; route flowfiles" pattern vs. "input port per flow" design. It is possible to accomplish what you want with either design, but I think this will be much more performant and easier to build & maintain.

Crawling With Multiple EC2 Instances

I've got a crawling process I've written in python that I am running on an ec2 instance on amazon. I've written the crawler so that it reports back to a separate "hub" instance with it's results. The hub processes the results of the crawler and the crawler is free to keep crawling. What I had in mind with this crawling instance is that it would be easy to clone several instances of the crawler, having each of them report back to the hub for processing.
So, at this point, I have one hub and 8 separate crawlers (all on their own instances) continually crawling and reporting back in etc.
I was thinking with the small, separate crawlers:
There is redundancy so if one crawler gets hung up, the rest of the crawlers can keep working.
(This is an assumption) I have better network utilization if each crawler has it's own independent ip.
I can spin up several crawlers or scale down depending on my current needs.
My question is; Is this efficient? Would I be better off spinning up a larger instance and somehow splitting up the network utilization?
Thank you in advance for your input. Please forgive me if this is off topic for stackoverflow.
my view on your question.
(1) There is redundancy so if one crawler gets hung up, the rest of the crawlers can keep working.
set with auto-scaling group to manage these crawler instances.
(2) (This is an assumption) I have better network utilization if each crawler has its own independent ip.
Yes, ec2 instance can have its own public and private ip if created in public subnets. Within one region, you can set to launch the instances on different Available Zone (for example, us-west-2 region has three AZs). With that, you can spread the network usage.
(3) I can spin up several crawlers or scale down depending on my current needs.
with auto-scaling group, you should be easy to control this ***
My question is; Is this efficient?
* If you can, set ec2 instances in different regions (US, EU, Asia, etc) to reduce latency for some websites. *
Would I be better off spinning up a larger instance and somehow splitting up the network utilization?
* in your case, separate smaller instances should be a better solution, it also saves much money for you. Maybe you can also think to use spot instance for these crawlers. to save more *

monitor incoming http requests to a website with a loadbalancer

I am stuck with the problem of monitoring http requests of a website with an internet-facing loadbalancer. To be specific, I have hosted a website that uses a server farm of AWS EC2 instances with a loadbalancer (ELB) at the front. Now I want to get an idea about the request arrival rate per second (or per minute) to scale the server farm.
I have thought of an approach to perform this task online. The idea is to get the ELB log each minute and parsing it for http request count for the last minute. Just wondering whether there is any efficient way to do it online.
Any help would be highly appreciated.
Your best bet is to use AWS's cloudwatch to do the monitoring for you:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html
Elastic Load Balancing publishes data points to Amazon CloudWatch
about your load balancers and your back-end application instances.
CloudWatch allows you to retrieve statistics about those data points
as an ordered set of time-series data, known as metrics. Think of a
metric as a variable to monitor, and the data points represent the
values of that variable over time. Each data point has an associated
time stamp and (optionally) a unit of measurement. For example, total
number of healthy EC2 instances behind a load balancer over a
specified time period can be a metric.
Amazon CloudWatch provides statistics based on the metric data points
published by Elastic Load Balancing. Statistics are metric data
aggregations over specified periods of time. The following statistics
are available: Minimum (min), Maximum (max), Sum, Average, and Count.
When you request statistics, the returned data stream is identified by
the metric name and a dimension. A dimension is a name/value pair that
helps you to uniquely identify a metric. For example, you can request
statistics of all the healthy EC2 instances behind a load balancer
launched in a specific Availability Zone.

Rule of thumb for determining web server scale up and down?

What's a good rule of thumb for determining whether to scale up or down the number of cloud based web servers I have running? Are there any other metrics besides request execution time, processor utilization, available memory, and requests per second that should be monitored for this purpose? Should the weighted average, standard deviation or some other calculation be used for determining scale up or down? And finally, are there any particular values that are best for determining when to add or reduce server instances?
This question of dynamically allocating compute instances brings back memories of my control systems classes in engineering school. It seems we should be able to apply classical digital control systems algorithms (think PID loops and Z-transforms) to scaling servers. Spinning up a server instance is analogous to moving an engine throttle's stepper motor one notch to increase the fuel and oxygen rate in response to increased load. Respond too slowly and the performance is sluggish (overdamped), too quickly and the system becomes unstable (underdamped).
In both the compute and physical domains, the goal is to have resources match load. The good news is that compute systems are among the easier control systems to deal with, since having too many resources doesn't cause instability; it just costs money, similar to generating electricity in a system with a resistor bank to burn off excess.
It’s great to see how the fundamentals keep coming around again! We learned all that for a reason.
Your question is a hot research area right now. However, the web-server utilization can be automated by the cloud providers in different ways. For the details, how it works? Which metrics effects the scale up and down: you can glance at this paper.
Amazon has announced Elastic Beanstalk, which lets you deploy an application to Amazon’s EC2 (Elastic Compute Cloud) and have it scale up or down, by launching or terminating server instances, according to demand. There is no additional cost for using Elastic Beanstalk; you are charged for the instances you use.
Also, you can check Auto Scaling which Amazon AWS offers.
Auto Scaling allows you to scale your
Amazon EC2 capacity automatically up
or down according to conditions you
define. With Auto Scaling, you can
ensure that the number of Amazon EC2
instances you’re using increases
seamlessly during demand spikes to
maintain performance and decreases
automatically during demand lulls to
minimize costs. Auto Scaling is
particularly well suited for
applications that experience hourly,
daily, or weekly variability in usage.
Auto Scaling is enabled by Amazon
CloudWatch and available at no
additional charge beyond Amazon
CloudWatch fees.
I recommend you to read the details from Amazon AWS to dig how their system utilize scale up and down for web servers.

Load balancing: DNS round robin in front of hardware load balancers. How to share stickiness?

DNS Round Robin (DRR) permits to do cheap load balancing (distribution is a better term). It has the pro of permitting infinite horizontal scaling. The con is that if one of the web servers goes down, some clients continue to use the broken IP for minutes (min TTL 300s) or more, even if the DNS implements fail-over.
An Hardware Load Balancer (HLB) handles such web server failures transparently but it cannot scale its bandwidth indefinitely. An hot spare is also needed.
A good solution seems to use DRR in front to a group of HLB pairs. Each HLB pair never goes down and therefore DRR never keeps clients down. Plus, when bandwidth isn't enough you can add a new HLB pair to the group.
Problem: DRR moves clients randomly between the HLB pairs and therefore (AFAIK) session stickiness cannot work.
I could just avoid to use session stickiness but it makes better use of caches therefore is something that I want to preserve.
Question: is it possible/exist an HLB implementation where an instance can share its (sessionid,webserver) mapping with other instances?
If this is possible then a client would be routed to the same web server independently by the HLB that routed the request.
Thanks in advance.
Modern load balancers have very high throughput capabilities (gigabit). So unless you're running a huuuuuuuuuuge site (e.g. google), adding bandwidth is not why you'll need a new pair of load balancers, especially since most large sites offload much of their bandwidth to CDNs (Content Delivery Networks) like Akamai. If you're pumping a gigabit of un-CDN-able data through your site and don't already have a global load-balancing strategy, you've got bigger problems than cache affinity. :-)
Instead of bandwidth limits, sites tend to add additional LB pairs for geo-distribution of servers at separate data centers to ensure users spread across the world can talk to a server closest to them.
For that latter scenario, load balancer companies offer geo-location solutions, which (at least until a few years ago which was when I was following this stuff) were based on custom DNS implementations which looked at client IPs and resolved to the load balancer pairs Virtual IP address which is "closest" (in network topology or performance) to the client. These days, CDNs like Akamai also offer global load balancing services (e.g. http://www.akamai.com/html/technology/products/gtm.html). Amazon's EC2 hosting also supports this kind of feature for sites hosted there (see http://aws.amazon.com/elasticloadbalancing/).
Since users tend not to move across continents in the course of a single session, you automatically get affinity (aka "stickiness") with geographic load balancing, assuming your pairs are located in separate data centers.
Keep in mind that geo-location is really hard since you also have to geo-locate your data to ensure your back-end cross-data-center network doesn't get swamped.
I suspect that F5 and other vendors also offer single-datacenter solutions which achieve the same ends, if you're really concerned about the single point of failure of network infrastructure (routers, etc.) inside your datacenter. But router and switch vendors have high-availability solutions which may be more appropriate to address that issue.
Net-net, if I were you I wouldn't worry about multiple pairs of load balancers. Get one pair and, unless you have a lot of money and engineering time to burn, partner with a hoster who's good at keeping their data center network up and running.
That said, if cache affinity is such a big deal for your app that you're thinking about shelling out big $$$ for multiple pairs of load balancers, it may be worth considering some app architecture changes (like using an external caching cluster). Solutions like memcached (for linux) are designed for this scenario. Microsoft also has one coming called "Velocity".
Anyway, hope this is useful info-- it's admittedly been a while since I've been deeply involved in this space (I was part of the team which designed an application load balancing product for a large software vendor) so you might want to double-check my assumptions above with facts you can pull off the web from F5 and other LB vendors.
Ok, this is an ancient question, which I just found through a Google search. But for any future visitors, here is some additional clarifications:
Problem: [DNS Round Robin] moves clients randomly between the HLB pairs and therefore (AFAIK) session stickiness cannot work.
This premise is as best I can tell not accurate. It seems nobody really knows what old browsers might do, but presumably each browser window will stay on the same IP address as long as it's open. Newer operation systems probably obey the "match longest prefix" rule. Thus there shouldn't be much 'flapping', randomly switching from one load balancer IP to another.
However, if you're still worried about users getting randomly reassigned to a new load balancer pair, then a small modification of the classic L3/4 & L7 load balancing setup can help:
Publish DNS Round Robin records that go to Virtual high-availability IPs that are handled by L4 load balancers.
Have the L4 load balancers forward to pairs of L7 load balancers based on the origin IP address, i.e. use consistent hashing based on the end users IP to always route end users to the same L7 load balancer.
Have your L7 load balancers use "sticky sessions" as you want them to.
Essentially this is just a small modification to what Willy Tarreau (the creator of HAProxy) wrote years ago.
thanks for having put things in the right perspective.
I agree with you.
I did some reading and found:
Flickr: http://highscalability.com/flickr-architecture
4 billion queries per day --> about 50000 queries/s
Youtube: http://highscalability.com/youtube-architecture
100 million video views/day --> about 1200 video views/second
PlentyOfFish: http://highscalability.com/plentyoffish-architecture
600 pages/second
200 Mbps used
CDN used
Twitter: http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster
300 tweets/second
600 req/s
A very top end LB like this can scale up :
200,000 SSL handshakes per second
1 million TCP connections per second
3.2 million HTTP requests per second
36 Gbps of TCP or HTTP throughput
Therefore, you are right a LB could hardly become a bottleneck.
Anyway I found this (old) article http://www.tenereillo.com/GSLBPageOfShame.htm
where it is explained that geo-aware DNS could create availability issues.
Could someone comment on that article?
Thanks,
Valentino
So why not keep it simple and have the DNS server give out a certain IP address (or addresses) based on the origin IP address (i.e. use consistent hashing based on the end users IP to always give end users the same IP address(es)) ?
I'm aware that this only provides a simple and cheap load distribution mechanism.
I have been looking for this, but haven't found a DNS server which implements this (although Bind has some possibilities with views).

Resources