HTTP2 does not yet support etags? - http2

I am currently making a server for dynamic and static files with Node. I'm trying to implement HTTP2. What surprises me is that it seems that the HTTP2 push does not support ETags!
When the client sends the headers to retrieve a file that starts with a push, and that it has accepted, it ignores the "IF-NONE-MATCH" header.
It's a waste, I do not understand the reason for this behavior. Is this the case or am I missing something?

As discussed in the comments the server pushes the resource, so there is no client request, so there is no Etag to send.
So HTTP/2 does support Etags - they just have no relevance for pushed requests.
And yes this does mean cached resources are ignored for Pushed resources - which is one of the big drawbacks of Push and why many people do not recommend using it. When a client sees the PUSH_PROMISE that a server sends before pushing a resource, it can reject it with a RST_STREAM request but by the time that makes it to the server often a good part (if not all) of the resource will have already been pushed.
There are a few ways around this:
You could track what has already been pushed using cookies for example. I've a simple example with Apache config here: https://www.tunetheweb.com/performance/http2/http2-push/. Of course that assumes that cookies and cache are in sync but they may not be (they can be cleared independently).
Some servers track what has already been pushed. Apache for example allows an HTTP/2 push diary to be configured (set to 256 items by default) which tracks items pushed on that connection. If you visit page1.html and it pushes styles.css, and then you visit page2.html and it also attempts to push styles.css Apache will not push it as it knows you already have it. However that only works if you are using the same connection. It you come back later on a new connection, but it's still in the cache then it will be re-pushed.
There was a proposal for Cache digests, which allow the browser to send an encoded list of what is in the cache at the start of any connection, and the server could use that to know whether to push an item or not. However work on that has been stopped recently as there were some privacy concerns about this.
Ultimately HTTP/2 Push has proven to be tricky to make useful and usage of it is incredibly low because of this. In large part due to this, but also because it is complex and there are other implication issues. Even if all those were solved, it's still easy to over push resources when perhaps it's best to let the browser request the resources in the order it knows it needs them. The Chrome team have even talked about turning it off and not supporting it.
Many are recommending using Early Hints with status code 103 instead, as it tells the browser what to request, rather than just pushing it. The browser can then use all it's usual knowledge (what's in the cache, what priority it should be requested with...etc.) rather than overriding all this like Push does.
Cheap plug, but if interested in this then Chapter 5 of my recently published book discusses this all in a lot more detail then can be squeezed into an answer on Stack Overflow.

Related

CDN-server with http/1.1 vs. webserver with http/2

I have a hosted webserver with http/2 (medium fast) and additionally I have a space on a fast CDN-Server with only http/1.1.
Is it recommended to load some ressources from the CDN or should I use only the webserver because of http/2?
Loading too many recources from the CDN could be a bottleneck due to http/1.1?
Would be kind to get some hints...
You need to test. It really depends on your app, your users and your servers.
Under HTTP/1.1 you are limited to 6 connections to a domain. So hosting content on a separate domain (e.g. static.example.com) or loading from a CDN was a way to increase that limit beyond 6. These separate domains are also often cookie-less as they are on separate domains which is good for performance and security. And finally if loading jQuery from code.jquery.com then you might benefit from the user already having downloaded it for another site so save that download completely (though with the number of versions of libraries and CDNs the chance of having a commonly used library already downloaded and in the browser cache is questionable in my opinion).
However separate domains requires setting up a separate connection. Which means a DNS lookup, a TCP connection and usually an HTTPS handshake too. This all takes time and especially if downloading just one asset (e.g. jQuery) then those can often eat up any benefits from having the assets hosted on a separate site! This is in fact why browsers limit the connections to 6 - there was a diminishing rate of return in increasing it beyond that. I've questioned the value of sharded domains for a while because of this and people shouldn't just assume that they will be faster.
HTTP/2 aims to solve the need for separate domains (aka sharded domains) by removing the need for separate connections by allowing multiplexing, thereby effectively removing the limit of 6 "connections", but without the downsides of separate connections. They also allow HTTP header compression, reducing the performance downside to sending large cookies back and forth.
So in that sense I would recommended just serving everything from your local server. Not everyone will be on HTTP/2 of course but the support is incredible strong so most users should.
However, the other benefit of a CDN is that they are usually globally distributed. So a user on the other side of the world can connect to a local CDN server, rather than come all the way back to your server. This helps with connection time (as TCP handshake and HTTPS handshake is based on shorter distances) and content can also be cached there. Though if the CDN has to refer back to the origin server for a lot of content then there is still a lag (though the benefits for the TCP and HTTPS setup are still there).
So in that sense I would advise to use a CDN. However I would say put all the content through this CDN rather than just some of it as you are suggesting, but you are right HTTP/1.1 could limit the usefulness of that. That's weird those as most commercial CDNs support HTTP/2, and you also say you have a "CDN server" (rather than a network of servers - plural) so maybe you mean a static domain, rather than a true CDN?
Either way it all comes down to testing as, as stated at the beginning of this answer it really depends on your app, your users and your servers and there is no one true, definite answer here.
Hopefully that gives you some idea of the things to consider. If you want to know more, because Stack Overflow really isn't the place for some of this and this answer is already long enough, then I've just written a book which spends large parts discussing all this: https://www.manning.com/books/http2-in-action

does it make sense to server-push woff2?

I'm reading quite a bit about http2's server-push. Also did some experimenting (on a beginner's level)...
Well, my question is: Does it make sense to server-push woff2 web-fonts? (since not every browser uses them), and, is there a method to push the correct font (if not already in the cache)?
Zach points out how important it is to have a fast font-delivery-solution, and CSS-Tricks (Chris Coyer) has a great method to get it done cache-aware...
Thank you!
david
Well that's an interesting question alright. The answer is: No you should not do this. But the reason is a little different than you might think...
For reasons that are a bit cryptic, fonts are always requested without credentials (basically cookies). For most browsers (Edge being the exception) this means the browser opens another connection for that request and this is important because HTTP/2 Pushes are linked to the connection. So if you push a resource on one connection, and the browser goes to get a resource from another connection it will not use that pushed resource (you do not push directly into the HTTP Cache as you might think).
This, and lots of other HTTP/2 Push trickiness and edge cases were discussed by Jake Archibald in his excellent HTTP/2 push is tougher than I thought article.
But it does beg the question of how you can decide what format to push even if this wasn't an issue, or if you wanted to send different image formats for example (that would be on the same connection). Other than looking at the User-Agent and guessing based off of that, there is now way for you to know what the browser supports.
There is a new HTTP Client Hints header currently being proposed which aims to allow the browser to indicate the device specifics. This currently is more concerned with image size and density, but could in theory also include the file formats that are supported.

What's the Best Way to Open a TCP Stream to Server?

Rather a hard to nail down question, but basically I'm wondering what the best way (and not "what's your opinion" but "which will most adequately meet the requirement i shall set forth) is to open a stream connection from a client side webpage to a server such that either can send data to the other without polling? I'm thinking the term for this is HTTP binding vs. HTTP Polling. The context here is a chat application - i'd like a streamed connection so that the browser isn't constantly pushing requests out. The client end here is KnockoutJS and jQuery. I'd like to be able to have the data pushed back and forth be JSON (or at least manipulatable by jQuery and Knockout's toJSON). The server end - not quite sure what it is going to be, but i'll probably be running on a linux server, so anything compatible with that works fine.
If there's any more details i can provide, just let me know - i'm sure i left some obvious detail out. Also, i'm aware there's probably a duplicate question on this, so if your answer is as good as closing for a dupe and putting in a link, that's great.
Thanks!
I think what you're looking for is referred to as Comet. The basic idea is to keep HTTP requests open for longer periods of time so that the server can send data to the client as it comes in, rather than the client having to continually poll the server for new data. There are multiple ways to implement it. This Wikipedia article is a good start for more info.
This MIX 2011 video discusses the long polling technique (although the suggestion in the video is that web sockets will be a better solution with future browsers).

How to most quickly get small, very frequent updates from a server?

I'm working on the design of a web app which will be using AJAX to communicate with a server on an embedded device. But for one feature, the client will need to get very frequent updates (>10 per second), as close to real time as possible, for an extended period of time. Meanwhile typical AJAX requests will need to be handled from time to time.
Some considerations unique to this project:
This data will be very small, probably no more than a single numeric value.
There will only be 1 client connected to the server at a time, so scaling is not an issue.
The client and server will reside on the same local network, so the connection will be fast and reliable.
The app will be designed for Android devices, so we can take advantage of any platform-specific browser features.
The backend will most likely be implemented in Python using WSGI on Apache or lighttpd, but that is still open for discussion.
I'm looking into Comet techniques including XHL long polling and hidden iframe but I'm pretty new to web development and I don't know what kind of performance we can expect. The server shouldn't have any problem preparing the data, it's just a matter of pushing it out to the client as quickly as possible. Is 10 updates per second an unreasonable expectation for any of the Comet techniques, or even regular AJAX polling? Or is there another method you would suggest?
I realize this is ultimately going to take some prototyping, but if someone can give me a ball-park estimate or better yet specific technologies (client and server side) that would provide the best performance in this case, that would be a great help.
You may want to consider WebSockets. That way you wouldn't have to poll, you would receive data directly from your server. I'm not sure what server implementations are available at this point since it's still a pretty new technology, but I found a blog post about a library for WebSockets on Android:
http://anismiles.wordpress.com/2011/02/03/websocket-support-in-android%E2%80%99s-phonegap-apps/
For a Python back end, you might want to look into Twisted. I would also recommend the WebSocket approach, but failing that, and since you seem to be focused on a browser client, I would default to HTTP Streaming rather than polling or long-polls. This jQuery Plugin implements an http streaming Ajax client and claims specifically to support Twisted.
I am not sure if this would be helpful at all but you may want to try Comet style ajax
http://ajaxian.com/archives/comet-a-new-approach-to-ajax-applications

Publicly available Web proxy forward cache logs/data sets

I'm looking to do some analysis on HTTP requests that occur between clients and web servers. Are there any recent (at least within last 4 years) publicly available data sets of web proxy forward cache logs, such as those recorded by a Squid proxy? I'm most interested in forward cache HTTP log data - so coming from a cache that sits between many clients and many servers. I'd have an auxiliary interest in reverse proxy data, such as a proxy that serves up HTTP responses on behalf of a single server, though a proxy log that spans many clients and many servers would be preferable.
I'm after basically as much data as I can get and the larger the number of clients represented in the data the better. I imagine universities/large corporations might have such data logs, though haven't been able to find any publicly available (and hence this question).
Thanks.
It used to be quote common, e.g., the NLANR traces, the DEC traces, etc. However, in the last few years no-one seems willing to share traces, perhaps because of privacy concerns (even with anonymisation of the client ip, cookies and URL).
See http://www.web-caching.com/traces-logs.html for some older ones.

Resources