Bottleneck on Heroku SSL according to Blitz.io - heroku

I have a simple Django app set up that I have been testing with Blitz.io.
When I test with many dynos I can get thousands of req/s on http:// X.com
When I switch to https:// X.com I get no more than 72 req/s no matter how many dynos.
And on https:// X.herokuapp.com/ I get more, but it still tops out at a few hundred req/s.
Is this a fluke that won't show up with normal use cases? A Blitz issue? A heroku issue? Would resources just be scaled up with demand?

We had a very similar issue with blitz.io and loader.io. See our blog post at http://making.fiftythree.com/load-testing-an-unexpected-journey/ for more details. It's very possible that blitz.io is the cause of your issue with SSL. We found that BlazeMeter could handle the load quite well.
If cost is a concern you might also want to try open source tools like siege or JMeter.

This answer assumes https://X.com uses the SSL:endpoint heroku addon to serve a custom cert.
The ssl:endpoint addon is implemented using an AWS Elastic Load Balancer. ELBs distribute load amongst their nodes using DNS. In my experience, each individual ELB node isn't particularly beefy, and SSL negotiation/decryption is non-trivial from a CPU perspective. So it's important when load testing to:
Re-resolve the hostname with each request to distribute load amongst all the ELB ips, especially as new ones are added in response to increased traffic.
Ramp up your load test very slowly. Amazon advises to increase load on ELBs at most by 50% every 5 minutes.
I'm not particularly suprised if the difference between HTTP/HTTPS capacity in terms of concurrent connections allowed on a single ELB node is substantial, which if you're pinned to one IP, may account for the difference you're observing.
I don't know the details of the https://*.herokuapp.com stack, but I'm not surprised at all that it can service quite a bit more https traffic than a cold ssl:endpoint ELB.

Related

How to prevent being affected by data-center DDoS attack & maintainance related downtime?

I'm hosting a web application which should be highly-available. I'm hosting on multiple linodes and using a nodebalancer to distribute the traffic. My question might be stupid simple - but not long ago I was affected by a DDoS hitting the data-center. That made me think how I can be better prepared next time this happens.
The nodebalancer and servers are all in the same datacenter which should, of course, be fixed. But how does one go about doing this? If I have two load balancers in two different data centers - how can I setup the domain to point to both, but ignore the one affected by DDoS? Should I look into the DNS manager? Am I making things too complicated?
Really would appreciate some insights.
Thanks everyone...
You have to look at ways to load balance across datacenters. There's a few ways to do this, each with pros and cons.
If you have a lot of DB calls, running to datacenters HOT can introduce a lot of latency problems. What I would do is as follows.
Have the second datacenter (DC2) be a warm location. It is configured for everything to work and is constantly getting data from the master DB in DC 1, but isn't actively getting traffic.
Use a service like CLoudFlare for their extremely fast DNS switching. Have a service in DC2 that constantly pings the load balancer in DC1 to make sure that everything is up and well. When it has trouble contacting DC1, it can connect to CloudFlare via the API and switch the main 'A' record to point to DC2, in which case it now picks up the traffic.
I forget what CloudFlare calls it but it has a DNS feature that allows you to switch 'A' records almost instantly because the actual IP address given to the public is their own, they just route the traffic for you.
Amazon also have a similar feature with CloudFront I believe.
This plan is costly however as you're running much more infrastructure that rarely gets used. Linode is and will be rolling out more network improvements so hopefully this becomes less necessary.
For more advanced load balancing and HA, you can go with more "cloud" providers but it does come at a cost.
-Ricardo
Developer Evangelist, CircleCI, formally Linode

Amazon EC2 high availability

Following the scenario:
There is a service that runs 24/7 and a downtime is extremely expensive. This service is deployed on Amazon EC2. I am aware to the importance of deploying the application on two different availability zones and even in two different regions in order to prevent single points of failure. But...
My question is whether there are any additional configuration issues that may affect the redundancy of an application. I mean also to wrong configuration (for example wrong configuration of the DNS that will make it fail in case of a fail over).
Just to make sure I am clear - I am trying to create a list of validations that should be tested in order to ensure the redundancy of an application deployed on EC2.
Thank you all!
Just as a warning, just because you put your services in two availability zones doesn't mean that you're fault tolerant.
For example, one setup I had was to have 4 servers on a load balancer with us-east-1a us-east-1b as the two zones. Amazon's outage a few months ago caused some outages with my software because the load balancers weren't working properly. They were still forwarding requests but the two dead instances I had in one of the zones were also still receiving requests. Part of the load balancer logic is to remove dead instances, but since the load balancer queue was backlogged those instances were never removed. In my setup there are two load balancers once in each zone, so all of the requests to one load balancer were timing out because there were no instances to respond to the request. Luckily for me, the browser retried the request with the 2nd load balancer so the feeds I had were still loading but were very very slow.
My advice is to make sure that if you choose to go with only two availability zones over two regions that you make sure your systems are not dependent on any part of another availability zone, not even the load balancers. For me, it's not worth the extra cost to launch two completely independent systems in different zones so I'm unable to avoid this problem again in the future. But if your software is critical to the point where losing the service for 1 hour would pay for the cost of running extra hardware then it's definitely worth the extra servers to set it up correctly.
I also recommend paying for AWS support and working with their engineers to make sure that your design doesn't have any flaws for high-availability.
Recap of the issue I discussed: http://aws.amazon.com/message/67457/

High Performance Highly Available Tracking System

I currently hold a tracking service that records visits from various sources. At times we record the visits and redirect to our clients or we let clients call us to report visits. The architecture is two worker boxes configured behind a load-balancer. This system is setup using Amazon EC2 and the load-balancer used is Amazon's Elastic LB.
I did some benchmarking tests and have noticed significant network latencies. The traffic through load-balancer suffers atleast 2 times more delay than hitting any of the boxes directly.
Has anyone experienced such an issue and has attempted to solve it? Is this an Amazon EC2 specific issue?
Is there any other architecture in use that would lower my network latencies significantly. e.g. Using a HA such that traffic needn't go through a load-balancer but instead hits the end point servers directly? Before I start investing time on that I wanted to hear what others think of the same.
Thanks alot for your time,
Santosh
Change your LB and give it another try. HAProxy is great session/cookie aware L7 balancer and can be setup in Amazon cloud AFAIK. See this: http://agiletesting.blogspot.com/2009/02/load-balancing-in-amazon-ec2-with.html
You have to take into account that ELBs perform better after a while, then initially. Don't ask me why, but that's how it is -- loadbalancer warming?
It also really depends how much traffic you sent the ELB. Keep in mind that the hardware the ELB is provisioned on seems like a regular small instance. So the throughput is capped at ~25 MBit (last time I checked). If you require more, go dedicated.
In the end, I too would suggest that you look Haproxy on a dedicated instance. I'd expect some delay, 2x more delay sounds unreal. Maybe use another small instance and benchmark it directly against the ELB and then try a c1.medium.

HTTPS on Apache; Will it slow Apache?

Our company runs a website which currently supports only http traffic.
We plan to support https traffic too as some of the customers who link to our pages want us to support https traffic.
Our website gets moderate amount of traffic, but is expected to increase over time.
So my question is this:
Is it a good idea to make our website https only?(redirect all http traffic to https)
Will this bring down the websites performance?
Has anyone done any sort of measurement?
PS: I am a developer who also doubles up as a apache admin.
Yes, it will impact performance, but it's usually not too bad compared to the running all the DB queries that go into the typical dymanically generated page.
Of course the real answer is: don't guess, benchmark it. Try it both ways and see the difference. You can use tools like siege and ab to simulate traffic.
Also, I think you may have more luck with this question over at http://www.serverfault.com/
I wouldn't worry about the load on the server; unless you are serving high volumes of static content, the encryption itself won't create much of a burden, in my experience.
However, using SSL dramatically slows down web sites by creating a lot more latency in connection setup.
An encrypted session requires about* three times as much time to set up as an unencrypted one, and the exact time depends on the latency.
Even on low latency connections, it is noticeable to the end user, but on higher latency (e.g. different continents, especially Australasia where latency to America/Europe is quite high) it makes a dramatic difference and will severely impact the user experience.
There are things you can do to mitigate it, such as ensuring that keep-alives are on (But don't turn them on without understanding exactly what the impact is), minimising the number of requests and maximising the use of browser cache.
Using HTTPS also affects browser behaviour in some cases. Certain optimisations tend to get turned off for security reasons, and some web browsers don't store objects loaded over HTTPS in the disc cache, which means they'll need to get them again in a later session, further impacting the user experience.
* An estimate based on some informal measurement
Is it a good idea to make our website
https only?(redirect all http traffic
to https) Will this bring down the
websites performance?
I'm not sure if you really mean all HTTP traffic or just page traffic. A lot of sites unnecessarily encrypt images, javascript and a bunch of other content that doesn't need to be hidden. This kind of content comprises most of the data transferred in a request so
if you do find feel that HTTPs is taking too much out of the system you can recommend the programmers separate content that needs to be secured from the content that does not.
Most webservers, unless severely underpowered, do not even use a fraction of the CPU power for serving up content. Most production servers I've seen are under 10%, even when using some SSL traffic. I think it would be best to see where your current CPU usage is at, and then do some of your own benchmarking to see how much extra CPU usage is used by an SSL request. I would guess it isn't that much.
No, it is not good idea to make any website as only https. Page loading speed might be little slower, because your server has to perform redirection operation unnecessarily for each web page request. It is better idea to make only pages as https that may contain secure/personal/sensitive information of users or organization. Even if the user information passing through web pages, you can use https. The web page which have information that can be shown to all in the world can normally use http. Finally, it is up to your requirement. If all pages contain secure information, you may make the website as https only.

Load Balancing in Amazon EC2?

We've been fighting with HAProxy for a few days now in Amazon EC2; the experience has so far been great, but we're stuck on squeezing more performance out of the software load balancer. We're not exactly Linux networking whizzes (we're a .NET shop, normally), but we've so far held our own, attempting to set proper ulimits, inspecting kernel messages and tcpdumps for any irregularities.
So far though, we've reached a plateau of about 1,700 requests/sec, at which point client timeouts abound (we've been using and tweaking httperf for this purpose). A coworker and I were listening to the most recent Stack Overflow podcast, in which the Reddit founders note that their entire site runs off one HAProxy node, and that it so far hasn't become a bottleneck. Ack! Either there's somehow not seeing that many concurrent requests, we're doing something horribly wrong, or the shared nature of EC2 is limiting the network stack of the Ec2 instance (we're using a large instance type). Considering the fact that both Joel and the Reddit founders agree that network will likely be the limiting factor, is it possible that's the limitation we're seeing?
Any thoughts are greatly appreciated!
Edit It looks like the actual issue was not, in fact, with the load balancer node! The culprit was actually the nodes running httperf, in this instance. As httperf builds and tears down a socket for each request, it spends a good amount of CPU time in the kernel. As we bumped the request rate higher, the TCP FIN TTL (being 60s by default) was keeping sockets around too long, and the ip_local_port_range's default was too low for this usage scenario. Basically, after a few minutes of the client (httperf) node constantly creating and destroying new sockets, the number of unused ports ran out, and subsequent 'requests' errored-out at this stage, yielding low request/sec numbers and a large amount of errors.
We also had looked at nginx, but We've been working with RighScale, and they've got drop-in scripts for HAProxy. Oh, and we've got too tight a deadline [of course] to switch out components unless it proves absolutely necessary. Mercifully, being on AWS allows us to test out another setup using nginx in parallel (if warranted), and make the switch overnight later on.
This page describes each of the sysctl variables fairly well (ip_local_port_range and tcp_fin_timeout were tuned, in this case).
Not answering the question directly, but EC2 now supports load balancing through Elastic Load Balancing rather than running your own load balancer in an EC2 instance.
EDIT: Amazon's Route 53 DNS service now offers a way to point a top-level domain at an ELB with an "alias" record. Since Amazon knows the current IP address of the ELB, it can return an A record for that current IP rather than having to use a CNAME record, while still being free to change the IP from time to time.
Not really an answer to your question, but nginx and pound both have good reputations as load-balancers. Wordpress just switched to nginx with good results.
But more specifically, to debug your problem. If you aren't seeing 100% cpu usage (including I/O wait), then you are network bound, yes.
EC2 internally uses a gigabit network, try using an XL instance, so you have the underlying hardware to yourself, and don't have to share that gigabit network port.
Yes, You could use an off-site load balancer.. and on bare metal LVS is a great choice, but your latency will be awful! Rumour has it that Amazon is going to fix the CNAME issue. However they are unlikely to add https, indepth or custom health checks, feedback agents, url matching, cookie insertion (and some people with good architecture would say quite right too.) However thats why Scalr, RightScale and others are using HAProxy usually two of them behind a round robin DNS entry. Here at Loadbalancer.org we are just about to launch our own EC2 load balancing appaliance:
http://blog.loadbalancer.org/ec2-load-balancer-appliance-rocks-and-its-free-for-now-anyway/
We are planning on using SSH scripts to intergrate with autoscaling in the same way rightscale does, any comments appreciated on the blog.
Thanks
I would look at switching to a off-site load balancer, not in the cloud and run something like IPVS on top of it. [The reason why it would be off of amazon's cloud is because of kernel stuff] If Amazon doesn't limit the source IP of packets coming out of the you could go with a unidirectional load balancing mechanism. We do something like this, and it gets us about 800,000 simultaneous requests [though we don't deal with latency]. I also would say use "ab2" (apache bench), as it is a little more user friendly, and easier to use in my humble opinion.
Even though your issue solved. KEMP Technologies now have a fully blown load balancer for AWS. Might save you some hassle.

Resources