nginx slow response after 500 users online - performance

I have a file download site, where I use nginx to deliver the files and apache for web part.
However as soon as I hit around 500 users online and 1500Mbit/s, Nginx responses get delayed with 10 to 60 seconds. Once connection established everything works as expected.
Apache have no issues on browse side, HDDs are only 50% util, IOwait is 3, bandwidth is 70% saturated, CPU is 5%. I really don't understand why is this happening, I get no errors in nginx logs.

Related

High response time for HTTPS requests on Elastic Beanstalk

I am currently hosting a Laravel project on Elastic Beanstalk. The issue is that requests made over HTTPS are experiencing much slower response times (average of 5 seconds). I have ruled out internet issues and the CPU/RAM utilization of the server is not fully utilized. Additionally, php-fpm (with nginx) is correctly configured with 16 pools on each instance (t3.small).
The problem seems to be with Axios (XHR request) but sometimes other HTML pages also experience the same issue. You can test this yourself by visiting https://laafisoft.bf (open the developer tools to check the response time). The configuration that I am using for the Load Balancer can be found in the image below. The certificate that I am using for HTTPS is issued by AWS Certificate Manager (RSA 2048).
When testing, I also noticed that requests over HTTP (port 80) were much faster (average of 200ms), but after some time the response time for HTTP requests increased to the same level as HTTPS requests. I am confident that the issue is not related to my Laravel application or a database problem. For comparison, I have the same version of the website hosted on DigitalOcean without a Load Balancer and it has much faster response times (https://demo.laafisoft.bf).
Any help is welcome, I'm new to AWS so maybe I'm missing something.

What are some reasons migrating a site to Cloudflare could cause an increase in the number of active connections to a load balancer?

I use an application load balancer (AWS EC2 instance) to manage traffic to my site. I recently changed CDNs to Cloudflare. With my previous CDN, I observed an average of about 80 active connections per hour to the load balancer. After switching to Cloudflare, I now observe an average of 800-1200 active connections per hour.
The load balancer is using keep-alive connections with an idle timeout of 60 seconds. The difference in PoPs between Cloudflare and the legacy CDN is too small to account for such a large difference in connection count. I'm not sure what else I might be missing. Any pointers as to the source of this connection increase would be much appreciated. Thanks!

Website Speed 'Connect Time'

After noticing a drastically slow load time on one of my website I started running some tests on Pingdom - http://tools.pingdom.com/
I've been comparing 2 sites, and the drastic difference is the 'Connect' time. On the slower site its around 2.5 seconds where as on my other sites its down around 650ms. I suppose its worth mentioning the slower site is hosted by a different company.
Thew only definition Pingdom offers is "The web browser is connecting to the server". I was hoping
Someone could elaborate on this a little for me, and
Point me in a direction of resolving it.
Thanks in advance
Every new TCP connection goes through a three-way handshake before the client can issue a request e.g. GET, to the web server.
Client sends SYN to server, server responds with SYN-ACK, client responds with ACK and then sends the request.
How long this process takes is latency bound i.e. if the round-trip to the server is 100ms then the full handshake will take 150ms, but as the client sends the request just after it sends the ACK work on the basis it's a cost of one-roundtrip.
Congestion and other factors can also affect the TCP connect time.
Connect times should be in the milliseconds range not in the seconds range - my round-trip time from the UK to a server in NY is 100ms so that's roughly what I'd expect the TCP connect time to be if I was requesting something from a server there.
See #igrigorik's High Performance Browser Networking for a really in-depth discussion / explanation - http://chimera.labs.oreilly.com/books/1230000000545/ch02.html#TCP_HANDSHAKE

Frequent 504 Gateway Time-out on appharbor

Deployed an asp.net mvc 4 app on appharbor with very low traffic. Each time the application is accessed after deployment of after a few minutes of inactivity, I get a 504 gateway time out error from nginx. Very annoying, what can I do to work around the error?
EDIT:
support ticket on appharbor's support site
The HTTP 504 is returned because the application doesn't respond within the request timeout. Application startup can take a little while, so sometimes a 504 may be returned on the initial request.
Applications on the free plan idle out after 20 minutes of inactivity. You can upgrade to one of the paid plans as they don't idle out after a period of inactivity.
We (AppHarbor) are working on decreasing the time it takes for applications to start up, which will mitigate the issue further. Note that the default request timeout was very recently increased to 120 seconds, so if you continue to experience this you're very welcome to open a ticket and let us know the application name so we can take a closer look.

Load time of hitting cache server (nginx/squid) is greater than hitting Glassfish directly when I load test with upto 10,000 users on jmeter. Why?

I'm hosting a web service on Glassfish which will probably get hit a million times a day so I thought of having a cache server (reverse proxy) to reduce the load on Glassfish.
I tried implementing them and load tested
directly on glassfish,
with nginx,
with squid.
The results was not as I expected. the average load time for 10000 users for GF, Nginx and Squid was 168ms, 245ms and 198ms - reverse of what I thought it would be.
The response contains cache control headers and it is a HIT each time it hits nginx or squid as can be seen from the access logs.

Resources