Please help resolve bottle neck in wait times for Http Responses? - performance

As far as a performance issue, the server is performing fine. With the exception of the http response wait times. This will become more of an issue as we grow our line of online services. All things being equal, I’m confused how this new server is it not loading pages as quickly as an older server running multiple websites, logging, etc…
Here is a screen shot from http://www.gtmetrix.com the online testing tool I’ve been using. These results are consistent regardless of time of day, The numbers here don’t make sense. The new site page is 75% smaller, yet its total time to live is only 26ms faster. In the below image the left side is NEW SERVER, the right side is OLD SERVER
The left portion of the timeline is the Handshaking portion. So, you can see, the new server, is about the same speed. The purple middle section, that represents wait time. It’s about 4 times the delay in milliseconds as OLD SERVER. The Grayish section on the right represents the actual time to download the file. You will also notice that the new server is significantly faster at downloading the response, this is most likely due to the 75% decrease in the response size.
You can see the complete results for the new server here. http://gtmetrix.com/reports/204.193.113.47/Kl614UCf
Here’s a table of the differences that I’m aware of, let me know if you see one that could be the culprit. I forgot to add this to the table, but the old server, is in production, right now serving requests, when www.gtmetrix is hitting it. In contrast, to my New server, which is just me connecting and generating requests.
My current hypothesis, is that the slowness is caused some combination of the server being virtualized, incorrect IIS settings, or the difference between 32bit and 64bit OSes

OK...
The server in in Sarasota(?), the test agent is in Vancouver so roughly 4,356KM apart (as the crow flies) so the best round trip time you could hope for is around 45ms.
Given it won't be a direct route and things like routers etc. will that add latency then the 155ms round-trip you seem to be getting is pretty reasonable.
Looking at the request for the HTML page the 344ms to complete it a pretty good time - basically 114ms to set up the connection, 115ms to receive the first bytes from server and then 155ms to get the complete response.
Unless you get decrease the roundtrip time then this time isn't going to improve much - have you tried testing from gtmetrix's Dallas server as a comparison?
If it is a slow server response then something like PAL (http://pal.codeplex.com/) is worth using as a first look to see what's happening on the server but I'd also look how quickly the SQL server is responding to the queries that are used on the test page.
A couple of things you want to look at later in the waterfall...
For the two files that are hosted from ajax.aspnetcdn.net it takes longer to resolve their DNS name than it does to download them so you may want to consider hosing them yourself
For the text based content e.g. HTML, CSS, JS etc. what level of gzip compression are you applying and are the compressed files being cached on the server? (the server times for them look a bit long)

Looking at the complete results, it seems the lower bound for the wait times would be 115ms. Not a single request is faster, most are around 125ms, and judging from the requested resources, there's a lot of static resources as well, so serving the response should not involve a lot of CPU. Even though responses are as small as 123 bytes, there's still this delay.
So it looks like a general issue, possibly not even related to IIS. Here some ideas how I'd try to debug this.
How long does a ping roundtrip take? (i.e. Is it a general network issue, routing etc.?)
How long do HTTP requests take when done from the server box (e.g. to localhost)? (If they all take more than ~100ms, start profiling inside the server box)

Related

What does multiplexing mean in HTTP/2

Could someone please explain multiplexing in relation to HTTP/2 and how it works?
Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.
And now for the much more complicated answer...
When you load a web page, it downloads the HTML page, it sees it needs some CSS, some JavaScript, a load of images... etc.
Under HTTP/1.1 you can only download one of those at a time on your HTTP/1.1 connection. So your browser downloads the HTML, then it asks for the CSS file. When that's returned it asks for the JavaScript file. When that's returned it asks for the first image file... etc. HTTP/1.1 is basically synchronous - once you send a request you're stuck until you get a response. This means most of the time the browser is not doing very much, as it has fired off a request, is waiting for a response, then fires off another request, then is waiting for a response... etc. Of course complex sites with lots of JavaScript do require the Browser to do lots of processing, but that depends on the JavaScript being downloaded so, at least for the beginning, the delays inherit to HTTP/1.1 do cause problems. Typically the server isn't doing very much either (at least per request - of course they add up for busy sites), because it should respond almost instantly for static resources (like CSS, JavaScript, images, fonts... etc.) and hopefully not too much longer even for dynamic requests (that require a database call or the like).
So one of the main issues on the web today is the network latency in sending the requests between browser and server. It may only be tens or perhaps hundreds of millisecond, which might not seem much, but they add up and are often the slowest part of web browsing - especially as websites get more complex and require extra resources (as they are getting) and Internet access is increasingly via mobile (with slower latency than broadband).
As an example let's say there are 10 resources that your web page needs to load after the HTML is loaded itself (which is a very small site by today's standards as 100+ resources is common, but we'll keep it simple and go with this example). And let's say each request takes 100ms to travel across the Internet to web server and back and the processing time at either end is negligible (let's say 0 for this example for simplicity sake). As you have to send each resource and wait for a response one at a time, this will take 10 * 100ms = 1,000ms or 1 second to download the whole site.
To get around this, browsers usually open multiple connections to the web server (typically 6). This means a browser can fire off multiple requests at the same time, which is much better, but at the cost of the complexity of having to set-up and manage multiple connections (which impacts both browser and server). Let's continue the previous example and also say there are 4 connections and, for simplicity, let's say all requests are equal. In this case you can split the requests across all four connections, so two will have 3 resources to get, and two will have 2 resources to get totally the ten resources (3 + 3 + 2 + 2 = 10). In that case the worst case is 3 round times or 300ms = 0.3 seconds - a good improvement, but this simple example does not include the cost of setting up those multiple connections, nor the resource implications of managing them (which I've not gone into here as this answer is long enough already but setting up separate TCP connections does take time and other resources - to do the TCP connection, HTTPS handshake and then get up to full speed due to TCP slow start).
HTTP/2 allows you to send off multiple requests on the same connection - so you don't need to open multiple connections as per above. So your browser can say "Gimme this CSS file. Gimme that JavaScript file. Gimme image1.jpg. Gimme image2.jpg... Etc." to fully utilise the one single connection. This has the obvious performance benefit of not delaying sending of those requests waiting for a free connection. All these requests make their way through the Internet to the server in (almost) parallel. The server responds to each one, and then they start to make their way back. In fact it's even more powerful than that as the web server can respond to them in any order it feels like and send back files in different order, or even break each file requested into pieces and intermingle the files together. This has the secondary benefit of one heavy request not blocking all the other subsequent requests (known as the head of line blocking issue). The web browser then is tasked with putting all the pieces back together. In best case (assuming no bandwidth limits - see below), if all 10 requests are fired off pretty much at once in parallel, and are answered by the server immediately, this means you basically have one round trip or 100ms or 0.1 seconds, to download all 10 resources. And this has none of the downsides that multiple connections had for HTTP/1.1! This is also much more scalable as resources on each website grow (currently browsers open up to 6 parallel connections under HTTP/1.1 but should that grow as sites get more complex?).
This diagram shows the differences, and there is an animated version too.
Note: HTTP/1.1 does have the concept of pipelining which also allows multiple requests to be sent off at once. However they still had to be returned in order they were requested, in their entirety, so nowhere near as good as HTTP/2, even if conceptually it's similar. Not to mention the fact this is so poorly supported by both browsers and servers that it is rarely used.
One thing highlighted in below comments is how bandwidth impacts us here. Of course your Internet connection is limited by how much you can download and HTTP/2 does not address that. So if those 10 resources discussed in above examples are all massive print-quality images, then they will still be slow to download. However, for most web browser, bandwidth is less of a problem than latency. So if those ten resources are small items (particularly text resources like CSS and JavaScript which can be gzipped to be tiny), as is very common on websites, then bandwidth is not really an issue - it's the sheer volume of resources that is often the problem and HTTP/2 looks to address that. This is also why concatenation is used in HTTP/1.1 as another workaround, so for example all CSS is often joined together into one file: the amount of CSS downloaded is the same but by doing it as one resource there are huge performance benefits (though less so with HTTP/2 and in fact some say concatenation should be an anti-pattern under HTTP/2 - though there are arguments against doing away with it completely too).
To put it as a real world example: assume you have to order 10 items from a shop for home delivery:
HTTP/1.1 with one connection means you have to order them one at a time and you cannot order the next item until the last arrives. You can understand it would take weeks to get through everything.
HTTP/1.1 with multiple connections means you can have a (limited) number of independent orders on the go at the same time.
HTTP/1.1 with pipelining means you can ask for all 10 items one after the other without waiting, but then they all arrive in the specific order you asked for them. And if one item is out of stock then you have to wait for that before you get the items you ordered after that - even if those later items are actually in stock! This is a bit better but is still subject to delays, and let's say most shops don't support this way of ordering anyway.
HTTP/2 means you can order your items in any particular order - without any delays (similar to above). The shop will dispatch them as they are ready, so they may arrive in a different order than you asked for them, and they may even split items so some parts of that order arrive first (so better than above). Ultimately this should mean you 1) get everything quicker overall and 2) can start working on each item as it arrives ("oh that's not as nice as I thought it would be, so I might want to order something else as well or instead").
Of course you're still limited by the size of your postman's van (the bandwidth) so they might have to leave some packages back at the sorting office until the next day if they are full up for that day, but that's rarely a problem compared to the delay in actually sending the order across and back. Most of web browsing involves sending small letters back and forth, rather than bulky packages.
Since #Juanma Menendez answer is correct while his diagram is confusing, I decided to improve upon it, clarifying the difference between multiplexing and pipelining, the notions that are often conflated.
Pipelining (HTTP/1.1)
Multiple requests are sent over the same HTTP connection. Responses are received in the same order. If the first response takes a lot of time, other responses have to wait in line. Similar to CPU pipeling where an instruction is fetched while another one is being decoded. Multiple instructions are in flight at the same time, but their order is preserved.
Multiplexing (HTTP/2)
Multiple requests are sent over the same HTTP connection. Responses are received in the arbitrary order. No need to wait for a slow response that's blocking others. Similar to out-of-order instruction execution in modern CPUs.
Hopefully the improved image clarifies the difference:
Request multiplexing
HTTP/2 can send multiple requests for data in parallel over a single TCP connection. This is the most advanced feature of the HTTP/2 protocol because it allows you to download web files asynchronously from one server. Most modern browsers limit TCP connections to one server. This reduces the additional round trip time (RTT), making your website load faster without any optimization, and makes domain sharding unnecessary.
Multiplexing in HTTP 2.0 is the type of relationship between the browser and the server that use a single connection to deliver multiple requests and responses in parallel, creating many individual frames in this process.
Multiplexing breaks away from the strict request-response semantics and enables one-to-many or many-to-many relationships.
Simple Ans (Source) :
Multiplexing means your browser can send multiple requests and receive multiple responses "bundled" into a single TCP connection. So the workload associated with DNS lookups and handshakes is saved for files coming from the same server.
Complex/Detailed Ans:
Look out the answer provided by #BazzaDP.

How to debug performance issue/optimize your meteor app

I just deployed my Meteor app onto a production server on Digital Ocean.
I noticed that for about 7500 documents, it takes about 3-5 seconds to fully fetch the objects (selectively taking only 3 fields) and populate the autocomplete data. I believe it should rather be instantenous for such number of data, so I am curious how I can debug performance issues from here and optimize more. How should I go about debugging performance issues for a Meteor app? I tried seeing the network tab but nothing seems to take more than a second. I am not sure why it takes 3-5 seconds for the search bar with an autocomplete feature to get ready. After a close inspection, populating autocomplete fields is instantaneous, and the time until subscribe function's callback is called is about 3 to 5 seconds.
I've already looked into Kadira, but it reported that everything was complete within milliseconds, so I am confused.
possibly related: Meteor's subscription and sync are slow
After all, is 3-5 seconds for 7800 documents with 2 fields reasonable?
Let me tell you what's really happening here.
Kadira shows the time taken to fetch the data from the server and queue it to the network. So, 500 - 700 ms is reasonable for that.
So, this 3-5 ms latency is the network latency. That means the time taken to send data to the client via the network. It's quite okay for 7500+ documents even with three fields over DDP.
So, my suggestion is to do the search on the server and use something like Search Source for that.
With that, you'll get the only data required to the client. Which reduce the latency and saves the CPU of your app.

Golang app-engine performance parameters

Using stock out-of-the-box configuration on a golang app-engine project, I am getting very disappointing performance. Any hints on what I might be missing? How should a golang google app be optimized?
Sending a few dozen requests, not more than six concurrently, I find only one instance handling all the requests, up to six requests concurrently (not sequentially) on that one instance - where I expected to see up to six instances. Possibly as a result, things seem to be blocking. I am seeing many timeouts, even on administrative functions like blobstore.Create(), which didn't happen when requests were being sent and processed individually.
EDIT1: These three lines
context.Infof("Sending request to blobstore to create %s as %s", Name, MimeType)
blobWriter, err := blobstore.Create(context, MimeType)
if err!=nil {
context.Warningf("Unable to access content store: %v",err)
}
are producing:
I 12:47:36.201 Sending request to blobstore to create download.jpg as application/octet-stream
W 12:47:41.251 Unable to access content store: Canceled: Deadline exceeded (timeout)
On failure here it is always about five seconds in blobstore.Create (a few milliseconds when it passes). Timeouts also occur in blobstore.Write and blobstore.Close and datastore, but with 20 to 30 second delays.
--End EDIT1.
There also seem to be performance issues. There is one computationally intensive bit, taking nearly a second to complete on my home machine (at 1.7GHz). According to the logged time stamps, that same code running on the remote app-engine (at 600MHz) is taking over 30 seconds on average, with a maximum of 109 seconds. That doesn't seem right!
EDIT2: The most computationally intensive bit used the resize function:
https://code.google.com/p/appengine-go/source/browse/example/moustachio/resize/resize.go
(with the obvious bug fixes). Not the most efficient resizer, but fast enough for now in a stand-along app. However it runs an order of magnitude slower in appengine (either the local SDK version 1.9 or running on Google's servers). Perhaps Google's version of the image library is slower? Probably the library? - A recursive fibonacci computation runs inside appengine in the same time as outside (same order of magnitude as C code).
--- End EDIT2
Any hints on how to get google app performance more similar to a multi-threaded stand-along application? So far these preliminary scaling experiments have been a miserable failure!
UPDATE: Using runtime.GOMAXPROCS(6), for a maximum of 6 concurrent requests, made no measurable difference. When using "manual_scaling" with more instances that requests was helpful, requests usually get assigned to different instances, but sometimes not - leading to problems.
A partial solution: Segregate computationally intensive requests on a separate module, running on separate instances, so that they do not block smaller more time-sensitive requests. Next, break down larger functions into several smaller requests, so that several can run "concurrently" on the same instance without timing out? (Make the client send several requests to do one job!)
It would be much better if I could ask the appengine just to start new instances for each request when none are available. Experimentally, starting a new instance is much cheaper than running two requests in slow motion on one instance.

Ajax use on website design

I just want to ask for your experience. I'm designing a public website, using jQuery Ajax in most of operations. I'm having some timeouts, and I think it should be for hosting provider cause. Any of you have expirience in this case and may advise me on some hints (especially on timeouts handling)?
Thanks in advance to all.
Esteve
If you have a half-decent host, chances are these aren't network timeouts but are rather due to insufficient hardware which causes your server-side scripts to take too long to answer. For example if you have an autocomplete field and the script goes through a database of 100,000 entries, this is a breeze for newer servers but older "budget" servers or overcrowded shared hosting servers might croak on it.
Depending on what your Ajax operations are, you may be able to break them down in shorter chunks. If you're doing database queries for example, use LIMIT and OFFSET and only return say, 5 entries at a time. When those 5 entries arrive on the client, make another Ajax call for 5 more, so from the user's point of view the entries will keep coming in and it will look fluid (instead of waiting 30s and possibly timing out before they see all entries at once). If you do this make sure you display a spiffy web 2.0 turning wheel to let the user know if they should be waiting some more or if it's done.

Distributed time synchronization and web applications

I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.
Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).
It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.
If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)
Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.
Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.

Resources