HTTP/2 : Wht multiple HTTP request are better? Or is the statement false? - http2

I was reading through https://hackernoon.com/how-it-feels-to-learn-javascript-in-2016-d3a717dd577f
A line says
Yes, but because HTTP/2 is coming now multiple HTTP requests are actually better.
Embedded within all the sarcasm in that post, this statement is presented as to be true. So, I would like to know whether this statement is actually true? and is yes then how are multiple request better? From what I know from the computer networks class is that for each new linked resource, a bunch of messages or packets are exchanged between the end hosts i.e. eating the resources/time/space on all the routers/bridges on that path.

In Http/2 multiple requests mean a slightly different thing than Http/1.1. Http/2 tries to utilize a single connection request system where the connection is closed after the page has completed all tasks. This way you can have dynamic loading of smaller pieces of a library and share the overhead which would amount to a smaller download overall then one large js file which is efficient in Http/1.1.
Marc B had it right with the Groceries analogy in which Http/2 is one trip to the server which grabs multiple pieces and returns vs Http/1.1 is a series of trips to grab the same pieces.

Related

Batch HTTP Request Performance gain

I want to know the performance gain from doing a HTTP batch request. is it only reducing the number of round trips to one instead of n times where n is the number of HTTP requests? if it's like that I guess you can keep http connection opened and send your http messages through and once finish you can close it to get performance gain.
The performance gain of doing batch requests depends on what you are doing with them. However just as an agnostic approach here you go:
If you can manage a keep-alive connection, yes this means you don't have to do the initial handshake for the connection. That reduces some overhead and certainly saves time spent handling subsequent packets along this connection. Because of this you can "pipeline" requests and decrease overall load latency (all else not considered). However, requests in HTTP1.1 are still bound to be FIFO so you can have hangups. This is where batching is useful. Since even with a keep-alive connection you can have this hangup (HTTP/2 will allow asynchronous handling) you can still have some significant latency between requests.
This can be mitigated further by batching. If possible you lump all the data needed for subsequent requests into one and this way everything is processed together and sent back as one response. Sure it may take a bit longer to handle a single packet as opposed to the sequential method, but your throughput is increased per time because roundtrip latency for request->response is not multiplied. Thus you get an even better performance gain in terms of requests handling speeds.
Naturally this approach depends on what you're doing with the requests for it to be effective. Sometimes batching can put too much stress on a server if you have a lot of users doing this with a lot of data so to increase overall concurrent throughput across all users you sometimes need to take the technically slower sequential approach to balance things out. However, the best approach will be known by you upon some simple monitoring and analysis.
And as always, don't optimize prematurely :)
Consider this typical scenario: the client has the identifier of a resource which resides in a database behind an HTTP server, of which resource they want to get an object representation.
The general flow to execute that goes like this:
The client code constructs an HTTP client.
The client builds an URI and sets the proper HTTP request fields.
Client issues the HTTP request.
Client OS initiates a TCP connection, which the server accepts.
Client sends the request to the server.
Server OS or webserver parses the request.
Server middleware parses the request components into a request for the server application.
Server application gets initialized, the relevant module is loaded and passed the request components.
The module obtains an SQL connection.
Module builds an SQL query.
The SQL server finds the record and returns that to the module.
Module parses the SQL response into an object.
Module selects the proper serializer through content negotiation, JSON in this case.
The JSON serializer serializes the object into a JSON string.
The response containing the JSON string is returned by the module.
Middleware returns this response to the HTTP server.
Server sends the response to the client.
Client fires up their version of the JSON serializer.
Client deserializes the JSON into an object.
And there you have it, one object obtained from a webserver.
Now each of those steps along the way is heavily optimized, because a typical server and client execute them so many times. However, even if one of those steps only take a millisecond, when you for example have fifty resources to obtain, those milliseconds add up fast.
So yes, HTTP keep-alive cuts away the time the TCP connection takes to build up and warm up, but each and every other step will still have to be executed fifty times. Yes, there's SQL connection pooling, but every query to the database adds overhead.
So instead of going through this flow fifty separate times, if you have an endpoint that can accept fifty identifiers at once, for example through a comma-separated query string or even a POST with a body, and return their JSON representation at once, that will always be way faster than individual requests.

What does multiplexing mean in HTTP/2

Could someone please explain multiplexing in relation to HTTP/2 and how it works?
Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.
And now for the much more complicated answer...
When you load a web page, it downloads the HTML page, it sees it needs some CSS, some JavaScript, a load of images... etc.
Under HTTP/1.1 you can only download one of those at a time on your HTTP/1.1 connection. So your browser downloads the HTML, then it asks for the CSS file. When that's returned it asks for the JavaScript file. When that's returned it asks for the first image file... etc. HTTP/1.1 is basically synchronous - once you send a request you're stuck until you get a response. This means most of the time the browser is not doing very much, as it has fired off a request, is waiting for a response, then fires off another request, then is waiting for a response... etc. Of course complex sites with lots of JavaScript do require the Browser to do lots of processing, but that depends on the JavaScript being downloaded so, at least for the beginning, the delays inherit to HTTP/1.1 do cause problems. Typically the server isn't doing very much either (at least per request - of course they add up for busy sites), because it should respond almost instantly for static resources (like CSS, JavaScript, images, fonts... etc.) and hopefully not too much longer even for dynamic requests (that require a database call or the like).
So one of the main issues on the web today is the network latency in sending the requests between browser and server. It may only be tens or perhaps hundreds of millisecond, which might not seem much, but they add up and are often the slowest part of web browsing - especially as websites get more complex and require extra resources (as they are getting) and Internet access is increasingly via mobile (with slower latency than broadband).
As an example let's say there are 10 resources that your web page needs to load after the HTML is loaded itself (which is a very small site by today's standards as 100+ resources is common, but we'll keep it simple and go with this example). And let's say each request takes 100ms to travel across the Internet to web server and back and the processing time at either end is negligible (let's say 0 for this example for simplicity sake). As you have to send each resource and wait for a response one at a time, this will take 10 * 100ms = 1,000ms or 1 second to download the whole site.
To get around this, browsers usually open multiple connections to the web server (typically 6). This means a browser can fire off multiple requests at the same time, which is much better, but at the cost of the complexity of having to set-up and manage multiple connections (which impacts both browser and server). Let's continue the previous example and also say there are 4 connections and, for simplicity, let's say all requests are equal. In this case you can split the requests across all four connections, so two will have 3 resources to get, and two will have 2 resources to get totally the ten resources (3 + 3 + 2 + 2 = 10). In that case the worst case is 3 round times or 300ms = 0.3 seconds - a good improvement, but this simple example does not include the cost of setting up those multiple connections, nor the resource implications of managing them (which I've not gone into here as this answer is long enough already but setting up separate TCP connections does take time and other resources - to do the TCP connection, HTTPS handshake and then get up to full speed due to TCP slow start).
HTTP/2 allows you to send off multiple requests on the same connection - so you don't need to open multiple connections as per above. So your browser can say "Gimme this CSS file. Gimme that JavaScript file. Gimme image1.jpg. Gimme image2.jpg... Etc." to fully utilise the one single connection. This has the obvious performance benefit of not delaying sending of those requests waiting for a free connection. All these requests make their way through the Internet to the server in (almost) parallel. The server responds to each one, and then they start to make their way back. In fact it's even more powerful than that as the web server can respond to them in any order it feels like and send back files in different order, or even break each file requested into pieces and intermingle the files together. This has the secondary benefit of one heavy request not blocking all the other subsequent requests (known as the head of line blocking issue). The web browser then is tasked with putting all the pieces back together. In best case (assuming no bandwidth limits - see below), if all 10 requests are fired off pretty much at once in parallel, and are answered by the server immediately, this means you basically have one round trip or 100ms or 0.1 seconds, to download all 10 resources. And this has none of the downsides that multiple connections had for HTTP/1.1! This is also much more scalable as resources on each website grow (currently browsers open up to 6 parallel connections under HTTP/1.1 but should that grow as sites get more complex?).
This diagram shows the differences, and there is an animated version too.
Note: HTTP/1.1 does have the concept of pipelining which also allows multiple requests to be sent off at once. However they still had to be returned in order they were requested, in their entirety, so nowhere near as good as HTTP/2, even if conceptually it's similar. Not to mention the fact this is so poorly supported by both browsers and servers that it is rarely used.
One thing highlighted in below comments is how bandwidth impacts us here. Of course your Internet connection is limited by how much you can download and HTTP/2 does not address that. So if those 10 resources discussed in above examples are all massive print-quality images, then they will still be slow to download. However, for most web browser, bandwidth is less of a problem than latency. So if those ten resources are small items (particularly text resources like CSS and JavaScript which can be gzipped to be tiny), as is very common on websites, then bandwidth is not really an issue - it's the sheer volume of resources that is often the problem and HTTP/2 looks to address that. This is also why concatenation is used in HTTP/1.1 as another workaround, so for example all CSS is often joined together into one file: the amount of CSS downloaded is the same but by doing it as one resource there are huge performance benefits (though less so with HTTP/2 and in fact some say concatenation should be an anti-pattern under HTTP/2 - though there are arguments against doing away with it completely too).
To put it as a real world example: assume you have to order 10 items from a shop for home delivery:
HTTP/1.1 with one connection means you have to order them one at a time and you cannot order the next item until the last arrives. You can understand it would take weeks to get through everything.
HTTP/1.1 with multiple connections means you can have a (limited) number of independent orders on the go at the same time.
HTTP/1.1 with pipelining means you can ask for all 10 items one after the other without waiting, but then they all arrive in the specific order you asked for them. And if one item is out of stock then you have to wait for that before you get the items you ordered after that - even if those later items are actually in stock! This is a bit better but is still subject to delays, and let's say most shops don't support this way of ordering anyway.
HTTP/2 means you can order your items in any particular order - without any delays (similar to above). The shop will dispatch them as they are ready, so they may arrive in a different order than you asked for them, and they may even split items so some parts of that order arrive first (so better than above). Ultimately this should mean you 1) get everything quicker overall and 2) can start working on each item as it arrives ("oh that's not as nice as I thought it would be, so I might want to order something else as well or instead").
Of course you're still limited by the size of your postman's van (the bandwidth) so they might have to leave some packages back at the sorting office until the next day if they are full up for that day, but that's rarely a problem compared to the delay in actually sending the order across and back. Most of web browsing involves sending small letters back and forth, rather than bulky packages.
Since #Juanma Menendez answer is correct while his diagram is confusing, I decided to improve upon it, clarifying the difference between multiplexing and pipelining, the notions that are often conflated.
Pipelining (HTTP/1.1)
Multiple requests are sent over the same HTTP connection. Responses are received in the same order. If the first response takes a lot of time, other responses have to wait in line. Similar to CPU pipeling where an instruction is fetched while another one is being decoded. Multiple instructions are in flight at the same time, but their order is preserved.
Multiplexing (HTTP/2)
Multiple requests are sent over the same HTTP connection. Responses are received in the arbitrary order. No need to wait for a slow response that's blocking others. Similar to out-of-order instruction execution in modern CPUs.
Hopefully the improved image clarifies the difference:
Request multiplexing
HTTP/2 can send multiple requests for data in parallel over a single TCP connection. This is the most advanced feature of the HTTP/2 protocol because it allows you to download web files asynchronously from one server. Most modern browsers limit TCP connections to one server. This reduces the additional round trip time (RTT), making your website load faster without any optimization, and makes domain sharding unnecessary.
Multiplexing in HTTP 2.0 is the type of relationship between the browser and the server that use a single connection to deliver multiple requests and responses in parallel, creating many individual frames in this process.
Multiplexing breaks away from the strict request-response semantics and enables one-to-many or many-to-many relationships.
Simple Ans (Source) :
Multiplexing means your browser can send multiple requests and receive multiple responses "bundled" into a single TCP connection. So the workload associated with DNS lookups and handshakes is saved for files coming from the same server.
Complex/Detailed Ans:
Look out the answer provided by #BazzaDP.

Batching generation of http responses

I'm trying to find an architecture for the following scenario. I'm building a REST service that performs some computation that can be quickly batch computed. Let's say that computing 1 "item" takes 50ms, and computing 100 "items" takes 60ms.
However, the nature of the client is that only 1 item needs to be processed at a time. So if I have 100 simultaneous clients, and I write the typical request handler that sends one item and generates a response, I'll end up using 5000ms, but I know I could compute the same in 60ms.
I'm trying to find an architecture that works well in this scenario. I.e., I would like to have something that merges data from many independent requests, processes that batch, and generates the equivalent responses for each individual client.
If you're curious, the service in question is python+django+DRF based, but I'm curious about what kind of architectural solutions/patterns apply here and if anything solving this is already available.
At first you could think of a reverse proxy detecting all pattern-specific queries, collecting all theses queries and sending it to your application in an HTTP 1.1 pipeline (pipelining is a way to send a big number of queries one after another and receiving all HTTP responses in the same order at the end, without waiting for a response after each query).
But:
Pipelining is very hard to do well
you would have to code the reverse proxy as I do not know a way to do it
one slow response in the pipeline block all the other responses
you need an http server able to give several queries to your application language, something which never happens if the http server is not directly coded in your application, because usually http is made to work on only one query (like you never receive 2 queries in a PHP env, you receive the 1st one, send the response, and then receive the next one, even if the connection contain 2 queries).
So the good idea would be to do that on the application side. You could identify matching queries, and wait for a small amount of time (10ms?) to see if some other queries are also incoming. You will need a way to communicate between several parallel workers here (like you have 50 application workers and 10 of them have received queries that could be treated in the same batch). This way of communication could be a database (a very fast one) or some shared memory, depends on the technology used.
Then when too much time waiting has been spend (10ms?) or when a big amount of queries are received, one of the worker could collect all queries, run the batch, and tell every other workers that a result is there (here again you need a central point of communication, like LISTEN/NOTIFY in PostgreSQL, a shared memory thing, a message queue service, etc.).
Finally every worker is responsible for sending the right HTTP response.
The key here is having a system where the time you loose in trying to share requests treatment is less important than the time saved in batching several queries together, and in case of low traffic this time should stay reasonnable (as here you will always loose time waiting for nothing). And of course you are also adding some complexity on the system, harder to maintain, etc.

http HEAD vs GET performance

I am setting-up a REST web service that just need to answer YES or NO, as fast as possible.
Designing a HEAD service seems the best way to do it but I would like to know if I will really gain some time versus doing a GET request.
I suppose I gain the body stream not to be open/closed on my server (about 1 millisecond?).
Since the amount of bytes to return is very low, do I gain any time in transport, in IP packet number?
Edit:
To explain further the context:
I have a set of REST services executing some processes, if they are in an active state.
I have another REST service indicating the state of all these first services.
Since that last service will be called very often by a very large set of clients (one call expected every 5ms), I was wondering if using a HEAD method can be a valuable optimization? About 250 chars are returned in the response body. HEAD method at least gain the transport of these 250 chars, but what is that impact?
I tried to benchmark the difference between the two methods (HEAD vs GET), running 1000 times the calls, but see no gain at all (< 1ms)...
A RESTful URI should represent a "resource" at the server. Resources are often stored as a record in a database or a file on the filesystem. Unless the resource is large or is slow to retrieve at the server, you might not see a measurable gain by using HEAD instead of GET. It could be that retrieving the meta data is not any faster than retrieving the entire resource.
You could implement both options and benchmark them to see which is faster, but rather than micro-optimize, I would focus on designing the ideal REST interface. A clean REST API is usually more valuable in the long run than a kludgey API that may or may not be faster. I'm not discouraging the use of HEAD, just suggesting that you only use it if it's the "right" design.
If the information you need really is meta data about a resource that can be represented nicely in the HTTP headers, or to check if the resource exists or not, HEAD might work nicely.
For example, suppose you want to check if resource 123 exists. A 200 means "yes" and a 404 means "no":
HEAD /resources/123 HTTP/1.1
[...]
HTTP/1.1 404 Not Found
[...]
However, if the "yes" or "no" you want from your REST service is a part of the resource itself, rather than meta data, you should use GET.
I found this reply when looking for the same question that requester asked. I also found this at http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
It would seem to me that the correct answer to requester's question is that it depends on what is represented by the REST protocol. For example, in my particular case, my REST protocol is used to retrieve fairly large (as in more than 10K) images. If I have a large number of such resources being checked on a constant basis, and given that I make use of the request headers, then it would make sense to use HEAD request, per w3.org's recommendations.
GET fetches head + body, HEAD fetches head only. It should not be a matter of opinion which one is faster. I don't undestand the upvoted answers above. If you are looking for META information than go for HEAD, which is meant for this purpose.
I strongly discourage this kind of approach.
A RESTful service should respect the HTTP verbs semantics. The GET verb is meant to retrieve the content of the resource, while the HEAD verb will not return any content and may be used, for example, to see if a resource has changed, to know its size or its type, to check if it exists, and so on.
And remember : early optimization is the root of all evil.
HEAD requests are just like GET requests, except the body of the response is empty. This kind of request can be used when all you want is metadata about a file but don't need to transport all of the file's data.
Your performance will hardly change by using a HEAD request instead of a GET request.
Furthermore when you want it to be REST-ful and you want to GET data you should use a GET request instead of a HEAD request.
I don't understand your concern of the 'body stream being open/closed'. The response body will be over the same stream as the http response headers and will NOT be creating a second connection (which by the way is more in the range of 3-6ms).
This seems like a very pre-mature optimization attempt on something that just won't make a significant or even measurable difference. The real difference is the conformity with REST in general, which recommends using GET to get data..
My answer is NO, use GET if it makes sense, there's no performance gain using HEAD.
You could easily make a small test to measure the performance yourself. I think the performance difference would be negligable, because if you're only returning 'Y' or 'N' in the body, it's a single extra byte appended to an already open stream.
I'd also go with GET since it's more correct. You're not supposed to return content in HTTP headers, only metadata.

Is the HEAD response faster than the GET?

I'm currently getting the info about the files with GET, will it be faster if I rewrite it using HEAD request? Cause I close the connection after the first response.
A HEAD response only includes the HTTP headers but no body - it is generally faster to just use a HEAD if you do not use any information in the body that would have normally transferred in a GET response - if there was no body to begin with it should not make a difference.
Also from here:
The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request SHOULD be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the entity implied by the
request without transferring the entity-body itself. This method is
often used for testing hypertext links for validity, accessibility,
and recent modification.
Whether HEAD is faster than GET depends purely on the implementation of the server-side (it usually is due to less data transfer)... IF the information HEAD delivers is sufficient in your case I would go with HEAD and only fallback to GET where HEAD is not implemented properly and/or some obscure proxy is messing with it...
You haven't given any information about the type of server you're accessing or network you're accessing it over.
It is indeed plausible that a HEAD request would complete faster than GET, since it involves less data transfer. However, on a fast or high latency connection this almost always won't matter. As for the server side, it really depends heavily on what you're doing, but in most circumstances there would be no measurable difference if you timed it.
If you don't need the body of the response, why not use HEAD anyway? Regardless of whether you can measure any difference in response time or you can't, it is more bandwidth-efficient.
It's probably negligible. It really depends what the server is doing. Once it receives a request, you can't guarantee to expect a response from a HEAD request or a GET request any quicker than the other.
In theory, because the response to a HEAD request should be the same as that of a GET request, but without the response body, it should be quicker because its transfering less data. But there is no guaruntee that one connection which processes a HEAD request will be any quicker than another connection processing a GET request.
The important thing to note with your question, is that you are talking about 'GET requests and HEAD requests' - instead of 'GET responses, and HEAD responses'
Logically - the request for a HEAD and a GET both take the same amount of time to travel from your PC to the server destination. Whatever that server does with the HEAD/GET will be up to the server owner, so they could make a HEAD take longer if they coded it to do so.
If you really want to get into semantics, you could argue that a HEAD request is one extra character of data than a GET request, therefore, a HEAD request technically has to transmit 1 byte more of data in the request phase. In practice, this is going to be an non-measurable difference in request time.
If you were to start a timer from the moment both 'RESPONSES' left the server on their way back to the requester, then logically speaking, a GET response will take longer to travel across the network. Since it will usually consist of HEADERS and BODY - the BODY can be a huge amount of data.
A Head response will take less time to travel, because it is just HEADERS.
Using a really extreme example - if you send a GET request for a 4GB file, it will take minutes for that GET response to finish writing the data to your network stream.
A HEAD request for the same 4GB file will finish almost instantly, because it is only sending information that describes the 4GB file at a high level, without having to transmit its contents to the requester.
A GET response will encompass a HEAD + BODY.
A HEAD response will contain the HTTP Headers only.
I personally use HEAD requests in combination with a technology called IPFS - which is a type of distributed internet, where files and data can be stored on a P2P network. In order to keep files alive on the network, they need to be requested frequently. However, if you pull the file via a GET request, you end up using bandwidth, to download that 4GB file you stored weeks ago.
Performing a HEAD request however, in my case, keeps the file alive on the network, but does not request the 4GB of data to travel to me on the network.

Resources