Causes of high network latency - performance

I have a site that is moving incredibly slowly right now. Both Safari's inspector and Firebug are reporting that most of the load time is due to latency. The actual download is happening in less than a second. There's a lot of database activity in play (though the metrics on that indicate that it's pretty healthy), but what else can cause really high latency? Is it a purely network thing or are there changes I can make to the app to improve the latency numbers?
I'm using YSlow to help identify performance improvements, but on the whole, I don't see it reporting anything that seems crazy unreasonable. Opportunities for improvement, certainly, but nothing that seems like it would cause the huge load times I'm seeing.
Thanks.
UPDATE
Some background and metrics, in case it's useful. This is a CakePHP application and I'm using my UsersController::login action as the benchmark. For the sake of identifying how much of a factor the application code plays in this, I've printed a stacktrace immediately upon entering UsersController::beforeFilter(). Here's output:
UsersController::beforeFilter() - APP/controllers/users_controller.php, line 13
Controller::startupProcess() - CORE/cake/libs/controller/controller.php, line 522
Dispatcher::_invoke() - CORE/cake/dispatcher.php, line 187
Dispatcher::dispatch() - CORE/cake/dispatcher.php, line 171
[main] - APP/webroot/index.php, line 83
Load times, as shown by Safari's inspector range from 11.2 seconds to 52.2 seconds. This would seem to point me away from the application code and maybe something with my host, but maybe I'm completely misinterpreting this or oversimplifying it?

If you cannot identify directly a slow moving component of your application, there are a number of other steps along the way that can certainly slow your site down. Whenever I'm experiencing unusually long polling, I typically start by looking at the local DNS and then onto my hosted DNS. Sometimes a cache refresh (on their part, not yours) can cause a lot of polling until their database has caught up.
Else, they might actually have a service outage and your requests are being made to their secondary or backup server. If everything seems fine in terms of domain resolution, your hosting provider might be experiencing a service outage that can take a number of different shapes like serving static content from their backups or over-allocating shared resources until everything is running as it should. You can experience a ton of what they call throttling on shared cloud architectures when they have a box go down. On the plus side, you don't have a total outage in this circumstance.
One time, and this was just in a shared grid configuration, I had a processor go to hell. The bizarre part of it was that static content was still serving from a backup, but it was still polling against our database (which was on a different server) and causing our account to throttle because of over allocation on the backup. Wasn't our fault, but the host started sending nasty emails about our excessive long-polls. Moral of the story is, if it's not your application, and it's out of the blue, somewhere along the line I'll bet you'll find some hardware failure or misconfiguration.
Also now that I think of it, if you are syndicating some outside content (be it server or browser side) it might not be in your chain of responsibility altogether. If you are serving ads for example from a subscriber service, they might be having a high-load period or outage. These are just the steps that I would take to narrow down the culprit.

Probably this will be not the solution for you, but when I has doggy slow safari (and FF too) I simply changed the DNS servers to opendns (208.67.222.222, 208.67.220.220) and all my problems are resolved.

Related

ASP.NET 5 Web API application intermittently unresponsive

We are working on an ASP.NET 5 Web API project that is in production now but we are experiencing an issue where it becomes unresponsive intermittently throughout the day.
A few notes about the application architecture. It is an ASP.NET Web API project using a MariaDB database on a separate EC2 instance within the same private network. The connection string uses the private IP of the database server to avoid any name resolution issues. The site is hosted via IIS 10.
The application itself has been developed carefully following the best practices provided by Microsoft. Heavy focus on async operations, minimizing query response times and offloading more expensive operations into background services.
The app is extremely responsive. It performs with sub 100ms responses on almost all requests, even the more complicated requests, and all the way up until it becomes unresponsive this high level of performance remains the same. We tend to see between 10-30 requests per second and 300-500 select queries per second at peak usage so not too extreme. However, randomly (2-3 times over a 24 hour period) it will begin hanging on requests and simply not respond to the request. During this time, the database is still extremely responsive and we are never over 300 connections out of our 512 connection limit.
The resources on the application server itself are never really taxed much at all. The CPU never gets above ~20% and the memory usage sits around 20-30%.
If I were to stop the site in IIS and start it again while this is happening, it will quickly come back online. If I don't it will be down for a few minutes until IIS finally kills it due to a failed health check. There are no real errors generated as a response to the issue other than typical errors caused by the hanging of the process such as connection terminated errors. The only thing I have seen before that gave me pause was the fact that there a few connection timeouts when getting the connection from the pool, but like I said, the connections to the server are never close to the limit.
Also, this app and version has been in production for months and it wasn't until the traffic volume started to grow that we started seeing these issues. At this point, I am at a loss for next steps of troubleshooting and I'm seeking suggestions.
In IIS App Pool advanced settings set Start Mode to AlwaysRunning
I never found a root cause for this issue, however, after updating to newer versions of .NET MVC this issue went away. My best guess is that changes with the Kestrel possibly resolved this issue, although, I have no idea what specific change that might have been. I have gone through the change logs a few times and didn't see anything that specifically jumped out at me.

Sometimes pages on my website load very slowly

Across all browsers/devices, I find random different pages, at random times, are very slow to load/don't load. The browser is stuck on 'Waiting for website.com'. I will wait 20 seconds and nothing will happen until I manually refresh the page. As I realise this is very vague, can you suggest a) most likely issues to look for first or b) some diagnostic tools that I could use to try and de-bug the issue as a starting point, so that my hosts/developers can solve the issue. Here are some results of recent speed tests.
One thing to also add is that, it seems it more often gets stuck on particular pages. Namely the pages where users take practice tests. After each time the user clicks 'Next', their selected answer is inserted into the database. My speculation is that potentially it's an issue with the DB itself, or the process which inserts into the database. It's when clicking 'Next', that the whole website sometimes just dies as described above.
Results from Google Speed Test
Waterfall image
A wait time of 20secs at random times and random pages could possibly be due to stop-the-world garbage collection. So GC logs are probably a good starting point.
A thread sampler such as Djigger a colleague of mine wrote might probably also help you figuring out what the machine is doing during the 20 seconds.
If that doesn't help I suggest to use a Profiler or better an APM tool to monitor whats going on on your system. Those tools give a you a broader insight of the internals.
You need to run a few page speed tests and look at the waterfall images.
It is very common on shared servers for the server to be too busy to get to your request. 20 seconds would indicate a serious issue with the server.
Another common reason is the page has a link to a third party resource and that resource is too often unavailable.
In your case the culprit is website.com and I assume that is your site.
Use something like webpagetest.org to run the tests.
In the waterfall image below
Dark Green is DNS lookup time.
Orange is the time for Browser to connect to server.
Green is the wait time for server to put image in output buffer.
Blue is the time for the server to transmit to the Browser.
The problem with the sample waterfall page is the index page took 4 seconds to be generated or retrieved. Most likely this is a Word Press site with plugins.
I suspect yours may be 20 seconds. But due to the randomness, is is also a good possibility it is a page resource that is stalled.
If it is the index page, then you likely have a poor ISP and or one of the other users of the server is hogging the CPU.
Keep running the tests until you see the problem occur.
It will be very obvious where the problem is located.
You can post the waterfall image and send me a message if you have any questions.
Waterfall from webpagetest.org

Web app initial load time

I am using a shared hosting plan at Bluehost to host a golf tournament live scoring mobile web app. I am caching everything I can on Cloudflare, and spent quite some time on overall optimization of the initial download & rendering times. There might be more I could do, but without question my single biggest issue is the initial call to my website: www.spanishpointscup.org. Sometimes this seems to be related to DNS lookup and other times related to Waiting(TTFB).
Below are 2 screen shot images of the network calls that show variations in accessing my index.html. Sometimes this initial file load can be even longer. Very rarely are any of the other files downloaded creating a long delay time, so my only focus now is the initial file load. I think that even if I had server side rendering, I would still have this issue.
Does anyone have specific recommendations that they are confident will help me? Switch to VPS or other host? Thank you.
This is typical when you use a shared server.
The DNS has nothing to do with the issue. DNS has to do with the request not the response. It is the Browser that must resolve the the domain name to an ip address.
The delay you are seeing is due to the server being busy and your page is sitting in a queue waiting behind other processes. Possibly you have a CPU grabbing neighbor on your shared server. Or Bluehost has some performance issues.
You will likely notice some image files take an excessively long time to transmit. Which image is slow will appear to be random with each fresh (not in cache) page load.
UPDATE
After further review I noticed the "wait" times are excessive. Wait time is in green on your waterfall. Notice how the transmit time (blue) is short. This is the time it takes the server to retrieve the page from the disk and put it into the transmit buffer. 300-400 millisecond is excessive.
Find a new service provider.

Optimising Magento Loading Speed - Can't Identify Why Initial Recieving So Slow

While our website is not yet complete graphically and design wise, most of the backend operations are near completion.
However, after optimising the mysql database we are still receiving a significant initial receiving period when tested on pingdom.com:
http://tools.pingdom.com/fpt/#!/IuoBna86v/http://foscam-uk.com
According to Pingdom:
The yellow part is the time it takes to resolve the hostname and similar (before the connection is initiated to the web server), the green part is connecting to the web server, and the blue part is the time it takes to retrieve the content from the webserver.
Upon asking our managed VPS support team we got the response : 'Have you tried optimizing your script? I believe that the high wait time on there indicates actual website loading time (meaning for your script to load); not actual connection to the website/server.'
Now, pingdom shows the js/css loading relatively quickly, the mysql database side of things doesn't seem to be slowing anything down either - does anyone have any suggestions of what this could be or might be causing it?
Thank you very much for your time and help.
89 requests are too many.
Reduce number of image request by creating sprites.This is pretty important from what is shown in pingdom.
Keep Alive should be set to On and Keep alive time should be a bit higher(15 seconds or so).
Use of compiler plus merge and minify js/css is recommended.
Change the hosting provider. 8 second loading is very very slow. It means that it actually is around 15-17 seconds for a user that doesn't have cached parts of your site (first time visitor). My site www.bebepunk.ro loads according to pingdom in 2.5 seconds and users still complain about the slowness of the site. Check also with http://www.webpagetest.org for both values.

Node.js suddenly getting extremely slow

We have this architecture with 2 node processes.
One polls a private API and pushes the changes to the second node if any.
The second node process the data and calls a bunch other API's and eventually emits a change event to the client, a HTML5 website, with socket.io
This second node will always process the data and will always emit changes even if no clients are connected. So in my opinion the CPU or mem usage is not that greatly affected by the number of connected clients. Also note that this architecture is still running on a private staging environment.
Everything runs fine and we're ready to go live until we noticed after couple of days, maybe a week, the second node suddenly gets extremely slow while the first node is still fine.
It gets so bad that even the connection between the two nodes gets timed out and they are on the same network over localhost. It also takes more then 10 seconds to browse to the socket.io/socket.io.js file.
I know its very hard to understand the problem without seeing any code but I'm kinda pulling my hair out because we have to go live in couple of days and my logs are not revealing anything and google isn't helping either.
Whats a good practice towards building Have you ever experienced anything like this? What was the problem and how did you fix it?
Whats a good monitor and profiler for node.js? (preferably free)
What are good practices towards building a node.js app with makes a lot of outgoing API calls?
Anything or anyone that could help me in the right direction of solving or even discovering the actual problem will be greatly appreciated!
Thank you!
Never experienced anything like this but may be the second node is blocking the event loop by doing CPU intensive work or waiting for some resource synchronously.
Add some logging in your code to see how much time second node is taking for processing each change pushed by first node. May be some type of change consumes CPU for 10 seconds or so to complete.
You should also start monitoring memory, CPU and network connections. When things slow down your monitoring will provide some clue as to where is the bottle neck.
For monitoring you can try following 3 tools
nodetime
hummingbird
node-monitor
Also read http://nodetime.com/blog/monitoring-nodejs-application-performance
It sounds like you have a memory leak somewhere in the second node, maybe from calling too many anonymous functions etc... do you notice your RAM usage slightly creeping up as it runs?

Resources