Caching and HTTP/2 - performance

I'm on a site running on HTTP/2 and I realized the following caching setup:
cache-control:max-age=604800, private
etag:W/"115-54e8a25e7b187"
expires:Fri, 14 Jul 2017 11:39:45 GMT
last-modified:Tue, 02 May 2017 13:02:11 GMT
Some questions about this setup:
is it not a problem that cache-control and expires are for different time intervals?
are etag and last-modified not redundant?
is there something else that should be done in terms of caching for performance if HTTP/2 is in use?

Answers to your questions:
No it's not a problem. Cache-control is used in preference to Expiries if both are specified. Mostly though web servers set them to have equivalent values.
Both ETag and Last-Modified are used for conditional responses with ETag used in preference (similar to Cache-Control and Expiries). Last-Modified has the benefit that it's more human readable, though ETags allow additional flexibility on defining this on something other than date. Though some implementations of Etags have issues (I don't recommend they are used for Apache servers for example as I describe here: https://www.tunetheweb.com/performance/http-performance-headers/etag/ ).
HTTP/2 doesn't change anything in terms of caching so the same headers and controls are used as under HTTP/1. There are a lot of other performance benefits to it. Server Push however brings interesting questions on how to only push resources which are not in the cache (using cache-digests or some sort of cookie based system to tell the server what the client has already cached). But the basics of caching using the HTTP headers mentioned above stays the same.

Related

Is it worth comparing CSP headers with HSTS headers?

I am working on a project to add HSTS header to web application. As a prework for that, I have added CSP header in report mode only with default-src https directive. The intent is to assess the violations and decide if adding HSTS header is going to break any usecase.
Questions:
Is it a worthy approach?
What are the scenarios that we will miss for HSTS with this approach?
What are recommended approaches, if they are different from what the one I have described above?
I would have thought it was pretty obvious if you are using https only on you server. Do you have https set up? Do you have a redirect set up to force all http requests to be redirected to https? These are questions that should be easy to answer by looking at your server config and I don't think you need CSP to answer them. And if the answer to both is not "Yes", then why are you considering HSTS?
CSP support is not universal (though admittedly neither is HSTS) and, with these things, it's usually the older, less common browsers that will break as you'll presumably test the common ones so whether your approach will give you the confidence you need to proceed is debatable.
The one thing you should be aware of is if you are using includeSubDomains flag. This can cause problems if it affects more servers than you intend it to, but CSP will not help as you will presumably only set up CSP on the servers you think will be affected. More info here: https://serverfault.com/questions/665234/problems-using-hsts-header-at-top-level-domain-with-includesubdomains.
Also be aware that, once HSTS is implemented, certificate errors cannot be bypassed in the browser anymore. Not that you should do this, but just another intentional effect of this flag that not everyone knows about.
Note that the only way to resolve HSTS issues (e.g. If you discover you need http after all), other than to remove the header and wait for the policy to expire, is to set the max age back to zero and hope people visit your site with https to pick up this new HSTS policy and override the previous one. With Chrome it is possible to manually view and remove the policy, but that's not practical if you have any volume of visitors and also not aware of any way to do this with the other browsers short of a full reinstall.
The best approach is to be fully aware of what HSTS is, and the caveats above, and then start with a low expiry, and build it up slowly as long as you do not experience any issues.
There is also the preload list but I would stay well away from that until you've been running this for at least 6 months with no issues.

How can I know whether the response I receive from a web server is cached response or not

I am using a Rest API from where I call the salesforce.com api. I am performing load testing and stress testing etc. So I am bothered about whether my responses are cached or not. I am receiving responses in json format
And the other factor is do I really have to worry about my responses being cached while performing performance testing??
thanks in advance
Firt of all: will salesforce.com api be called during the test? If so, do you have an agreement with salesforce team (or maybe they provide specific api for load-testing)? If you do not heve such agreement, you should make some stub of salesforce api and use it during the test, because your test may cause problem on salesforce side or (more likely) you'll be banned.
Caching responses of your app/site generally unwanted, cause in load-testing you're generally interested in worst-case scenario. Or, of cached answers you'll get, you may tune you test accordingly.
Most of http servers and proxies understand headers
Cache-control: no-cache
Pragma: no-cache
The only general way of cheking whether it works is to inspect your backend server logs. Or caching server logs, if anwsers from cache are somehow marked.
Depending on cache solution you use, it's possible to mark cached responses some other way, i.e. in nginx you can add header:
X-Cache: $upstream_cache_status
See details on http://wiki.nginx.org/HttpUpstreamModule#.24upstream_cache_status

Downside(s) of using HTTPs only on parts of the site

I am managing a shop that forces HTTPs on the register/login/account/checkout pages, but that's it, and I've been trying to convince people to force HTTPs on everything.
I know that it's recommended to use HTTPs everywhere, but not sure why.
Are there any good reasons to keep part of the site on HTTP ?
One good reason is that page perfomance has a massive impact on sales (there's lots of published studies) and SSL has a BIG imact on performance - particularly if it's not tuned right.
But running a mixed SSL and non-SSL is full of pitfalls for the unwary...
Exactly which pages you put inside SSL has a big impact on security too though - suppose you send a login form using HTTP with a POST target which is HTTPS - a trivial analysis would suggest this is secure - but in fact an MITM could modify the login page to send the post elsewhere or inject some ajax to fork a request to a different location.
Further with mixed HTTP and HTTPS you've got the problem of transferring sessions securely - the user fills their session-linked shopping basket outside the SSL site, then pays for it inside the SSL site - how do you prevent session fixation problems in the transition?
Hence I'd only suggest running a mixed site if you've got really expert skills in HTTP - and since you're asking this question here, that rather implies you don't.
A compromise solution is to use SPDY. SPDY requires SSL but makes most sites (especially ones that have not been heavily performance optimized) much faster. Currently it's not supported by MSIE - and (last time I checked) is not enabled by default in Firefox. But it's likely to make up a large part of HTTP/2.0 any time soon.
Using (good) CDNs over HTTPS also mitigates much of the performance impact of SSL.
There's really no need to use HTTPS on the whole website. Using HTTPS will cause the server to consume more resources as it has to do extra work to encrypt and decrypt the connection, not to mention extra steps/handshake in negotiating algorithms etc.
If you have a heavy traffic website, the performance hit can be quite big.
This will also mean a slow response time then using plain on HTTP.
You should only really use HTTPS on the parts of the site you actually need to be secure, such as when ever the user send important information to your site, completes forms, logs in, private parts of the site etc.
One other issue can be if you use resources from none secure URLS, maybe images/scripts hosted elsewhere. If they are not available over HTTPS then your visitors will get a warning about an insecure connection.
You also need to realise the fact HTTPS data/pages will hardly ever get cached. this will also add a performance penalty.

Do services like Cloudflare and Incapsula actually improve the performance of websites hosted on Windows Azure?

I'm running an image-heavy website hosted on Windows Azure. Back-end performance is great but response times for image thumbnails, which make the bulk of page sizes, are quite volatile. I'm using the Azure CDN for serving all images but their response times vary by orders of magnitude and I haven't found any pattern in the fast (~150 milliseconds) vs slow (3-4 seconds) requests yet. This also doesn't seem to be a local phenomenon since I've tested the load times from different locations/continents. My conclusion so far is that the Azure CDN is simply not that good after all and I started looking for other ways to improve the load times of static assets.
Now that the context is clear, here is my actual question: does anyone have experience with services like Cloudflare and Incapsula for improving the performance of websites hosted on cloud infrastructure like Windows Azure? These services promise reduced server load among other things, but I'm more interested if they are actually effective in reducing response times for static files, as well as any negative impact on dynamic page content. I'd greatly appreciate any answers based on practical experience and/or advice for alternative solutions.
UPDATE:
Here are the response headers for one of the images on the CDN:
HTTP/1.1 200 OK
Cache-Control: public, max-age:31536000
Content-Length: 4245
Content-Type: image/jpeg
Last-Modified: Sat, 21 Jan 2012 12:14:33 GMT
ETag: 0x8CEA64D5EC55FB6
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: d7a1ef38-6c99-4b38-a9f5-987419df5d24
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Date: Sun, 05 Feb 2012 12:56:12 GMT
Connection: keep-alive
Incapsula has two caching modes:
1) Basic - this mode caches static content according to directives in the http headers (in the same way a browser would or a commercial caching proxy would behave). This typically provides 30%-50% improvement
2) Advanced - this mode also caches static content that was not specified in the http headers and dynamic content by using advanced learning capabilities to determine what content is cache-able and when to expire the cache. These methods are optimized for striking the right balance between utmost caching and serving fresh/up to date content. This mode typically adds an additional 20%-30% improvement.
"These services promise reduced server load among other things, but I'm more interested if they are actually effective in reducing response times for static files, as well as any negative impact on dynamic page content. I'd greatly appreciate any answers based on practical experience and/or advice for alternative solutions."
We actually wouldn't impact your dynamic content, so everything should be good to go there.
We do cache static content. Most users see about a 50-60% improvement in performance overall, so you should definitely seen an improvement with static resources that are on your server.

Good practice or bad practice to force entire site to HTTPS?

I have a site that works very well when everything is in HTTPS (authentication, web services etc). If I mix http and https it requires more coding (cross domain problems).
I don't seem to see many web sites that are entirely in HTTPS so I was wondering if it was a bad idea to go about it this way?
Edit: Site is to be hosted on Azure cloud where Bandwidth and CPU usage could be an issue...
EDIT 10 years later: The correct answer is now to use https only.
you lose a lot of features with https (mainly related to performance)
Proxies cannot cache pages
You cannot use a reverse proxy for performance improvement
You cannot host multiple domains on the same IP address
Obviously, the encryption consumes CPU
Maybe that's no problem for you though, it really depends on the requirements
HTTPS decreases server throughput so may be a bad idea if your hardware can't cope with it. You might find this post useful. This paper (academic) also discusses the overhead of HTTPS.
If you have HTTP requests coming from a HTTPS page you'll force the user to confirm the loading of unsecure data. Annoying on some websites I use.
This question and especially the answers are OBSOLETE. This question should be tagged: <meta name="robots" content="noindex"> so that it no longer appears in search results.
To make THIS answer relevant:
Google is now penalizing website search rankings when they fail to use TLS/https. You will ALSO be penalized in rankings for duplicate content, so be careful to serve a page EITHER as http OR https BUT NEVER BOTH (Or use accurate canonical tags!)
Google is also aggressively indicating insecure connections which has a negative impact on conversions by frightening-off would-be users.
This is in pursuit of a TLS-only web/internet, which is a GOOD thing. TLS is not just about keeping your passwords secure — it's about keeping your entire world-facing environment secure and authentic.
The "performance penalty" myth is really just based on antiquated obsolete technology. This is a comparison that shows TLS being faster than HTTP (however it should be noted that page is also a comparison of encrypted HTTP/2 HTTPS vs Plaintext HTTP/1.1).
It is fairly easy and free to implement using LetsEncrypt if you don't already have a certificate in place.
If you DO have a certificate, then batten down the hatches and use HTTPS everywhere.
TL;DR, here in 2019 it is ideal to use TLS site-wide, and advisable to use HTTP/2 as well.
</soapbox>
If you've no side effects then you are probably okay for now and might be happy not to create work where it is not needed.
However, there is little reason to encrypt all your traffic. Certainly login credentials or other sensitive data do. One the main things you would be losing out on is downstream caching. Your servers, the intermediate ISPs and users cannot cache the https. This may not be completely relevant as it reads that you are only providing services. However, it completely depends on your setup and whether there is opportunity for caching and if performance is an issue at all.
It is a good idea to use all-HTTPS - or at least provide knowledgeable users with the option for all-HTTPS.
If there are certain cases where HTTPS is completely useless and in those cases you find that performance is degraded, only then would you default to or permit non-HTTPS.
I hate running into pointlessly all-https sites that handle nothing that really requires encryption. Mainly because they all seem to be 10x slower than every other site I visit. Like most of the documentation pages on developer.mozilla.org will force you to view it with https, for no reason whatsoever, and it always takes long to load.

Resources