Conditional GET Request and Expiry Header testing with Firebug-NET - performance

I'm using Firebug's NET feature to measure the performance of our application. I'm a bit confused the way it is displaying the timeline. We have enabled Expiry header for all static files(it is 30 days from the current date). Now even if the resource is available in cache, it still makes a conditional GET (that is what I think). Ideally there shouldn't make a connection to the server, but it takes 93ms to create a connection. Please find the image that I've attached.
Can some one please help me to understand this better?

The HTTP response contains a header entry "Etag". ETag is a cache validator tag.
The HTTP Client on seeing this response will always verify with the server if the Content has been updated.
Cache Validator tag has higher preference over other Cache control tags.
If you want content to be served from cache without it being validate on the server side then only keep the Expires header and remove the ETag header.

Related

Embedded google drive api to show a pdf returns 204

My website has an iframe pointing to https://drive.google.com/viewer?url=https://mywebsite/myfile.pdf&embedded=true
Most of the times, the pdf loads correctly, but sometimes it doesn't, I get just a blank page. The request seems to be returning 204 (request successful - response empty).
I could even replicate this, by entering the url above directly on the browser, and refreshing multiple times, until I got a 204, so it is not something on my website and/or the iframe.. any idea why this happens? and how to prevent it.
Thanks in advance :)
The error HTTP Status 204 (No Content) indicates that the server has successfully fulfilled the request and that there is no content to send in the response payload body. The server might want to return updated meta information in the form of entity-headers, which if present SHOULD be applied to current document’s active view if any.
By default, 204 (No Content) response is cacheable. If caching needs
to be overridden then response must include cache respective cache
headers.
In order to solve this issue, the lost update problem, the server may also include HTTP header ETag to let the client validate client side resource representation before making further update on server:
Lost update problem happens when multiple people edit a resource
without knowledge of each other’s changes. In this scenario, the last
person to update a resource “wins”, and previous updates are lost.
ETags can be used in combination with the If-Match header to let the
server decide if a resource should be updated. If ETag does not match
then server informs the client via a 412 (Precondition Failed)
response.
Please check this site for more details.

Prevent AJAX POST responses from being cached

I have AJAX POST requests generated from my webpage, and there may be multiple post requests with the same post data. But the response may vary, and I want to make sure I am not getting cached responses to any of these requests. I need each request to hit the webpage.
Am I right in assuming that responses to POST requests will not be cached?
There is two level of caching will be involved in that process
Browser caching
Server caching
To eliminate first one you have to cheat your browser and add a fake parameter to your ajax request so it will think it's unique each time i.e
www.example.com/api/ajax?123
www.example.com/api/ajax?1234
For server level you have to make sure that no cache been added to your configuration for such link, for example some developer will cache any file ends with .json or service like Cloud Flare it will automatically cache any static content.

How to invalidate the cache of an arbitrary URL?

According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10 clients must invalidate the cache associated with a URL after a POST, PUT, or DELETE request.
Is it possible to instruct a web browser to invalidate the cache of an arbitrary URL, without making an HTTP request to it?
For example:
PUT /companies/Nintendo creates a new company called "Nintendo"
GET /companies lists all companies
Every time I create a new company, I want to invalidate the cache associated with GET /companies. The browser doesn't do this automatically because the two operate on different URLs.
Is the Cache-Control mechanism inappropriate for this situation? Should I use no-cache along with ETag instead? What is the best-practice for this situation?
I know I can pass no-cache the next time I GET /companies but that requires the application to keep track URL invalidation instead of pushing the responsibility to the browser. Meaning, I want to invalidate the URL after step 1 as opposed to having to persist this information and applying it at step 2. Any ideas?
Yes, you can (within the same domain). From this answer (slightly paraphrased):
In response to a PUT or POST request, if the Content-Location header URI is different from the request URI, then the cache for the Content-Location URI is invalidated.
So in your case, include a Content-Location: /companies header in response to your POST request. This will invalidate the browser's cached version of /companies.
Note that this does not work for GET requests.
No, in HTTP/1.1 you may only invalidate a client's cache for a resource in a response to a request for that resource. It may be in response to a PUT, POST or DELETE rather than a GET (see RFC 7234, section 4.4 for details).
If you have a resource where you need clients to confirm that they have the latest version then no-cache and an entity tag is an ideal solution.
HTTP/2 allows for pushing a cache clear (Nine Things to Expect from HTTP/2 4. Cache Pushing).
In the link which you have given "the phrase "invalidate an entity" means that the cache will either remove all instances of that entity from its storage, or will mark these as "invalid" and in need of a mandatory revalidation before they can be returned in response to a subsequent request.". Now the question is where are the caches? I believe the Cache the article is talking about is the server cache.
I have worked on a project in VC++ where whenever a model changes the cache is updated. There is a programming logic implemention involved to achieve this. Your mentioned article rightly says "There is no way for the HTTP protocol to guarantee that all such cache entries are marked invalid" HTTP Protocol cannot invalidate cache on its own.
In our project example we used publish subscribe mechanism. Wheneven an Object of class A is updated/inserted it is published to a bus. The controllers register to listen to objects on the Bus. Suppose A Controller is interested in Object A changes, it will not be called back whenever Object Type B is changed and published. When Object Type A indeed is changed and published then Controller A Listener function updates the Cache with latest changes of Object A. The subsequent request of GET /companies will get the latest from the cache. Now there is a time gap between changing the object A and the Cache being refreshed with the latest changes. To avoid something wrong happening in this time gap Object is marked dirty before the Object A Changes. So a request coming inbetween of these times will wait for dirty flag being cleared.
There is also a browser cache. I remember ETAGS are used to validate this. ETAG is the checksum of the resource. For this Client should maintain old ETAG value somehow. If the checksum of resource has changed then the new resource with HTTP 200 is sent else HTTP 304 (use local copy) is sent.
[Update]
PUT /companies/Nintendo
GET /companies
are two different resources. Your the cache for /companies/Nintendo is only expected to be updated and not /companies (I am talking of client side cache) when PUT /companies/Nintendo request is executed. Suppose you call GET /companies/Nintendo next time, based on http headers the response is returned. GET /companies is a brand new request as it points to different resource.
Now question is what should be the http headers? It is purely application specific. Suppose it is stock quote I would not cache. Suppose it is NEWS item I would cache for certain time. Your reference link http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html has all the details of Cache http headers. Only thing not mentioned much is ETag usage. ETag can have checksum of resource. Check http://en.wikipedia.org/wiki/HTTP_ETag and also check https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers

No expires header sent, content cached, how long until browser makes conditional GET request?

Assume browser default settings, and content is sent without expires headers.
user visits website, browser caches images etc.
user does not close browser, or refresh page.
user continues to surf site normally.
assume the browse doesn't dump the cache for any reason.
The browser will cache images etc as the user surfs, but it's unclear when it will issue a conditional GET request to ask about content freshness (apart from refreshing the page). If this is a browser specific setting, where can I see it's value (for browsers like: safari, IE, FireFox, Chrome).
[edit: yes - I understand that you should always send expires headers. However, this research is aimed at understanding how the browser works with content w/o expires headers.]
From the the HTTP caching spec (section 13.4): Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation. This means that a user agent is free to do whatever it wants if no cache control header is sent. Most browsers use a combination of user settings and heuristics to determine whether (and how long) to cache in this situation.
HTTP/1.1 defines a selection of caching mechanisms; the expires header is merely one, there is also the cache-control header.
To directly answer your question: for a resource returned with no expires header, you must consider the returned cache-control directives.
HTTP/1.1 defines no caching behaviour for a resource served with no cache-related headers. If a resource is sent with no cache-control or expires headers you must assume the client will make a regular (non-conditional) request the next time the same resources is requested.
Any deviation from this behaviour qualifies the client as being not a fully conformant HTTP client, in which case the question becomes: what behaviour is to be expected from a non-conformant HTTP client? There is no way to answer that.
HTTP caching is complex, to fully understand what a conformant client should do in a given scenario, read and understand the HTTP caching spec.
Unless you send an expires header, most browsers will make a GET request for each subsequent refresh and will either get HTTP 200 OK (it will download the content again) or HTTP 304 Not Modified (and use the data in cache).

How to detect whether web page content is different from cached version

Hi guys
As you know checking process of web pages content is a little different from static pages or personal files on our machines because content of Dynamic web pages are changed on each request. So if we are going to use checksums to identifying changes, We'll fail! very simple example is when site owner are use Google Ads on him website; on each request Ads are different from previous. Also if we are going to cache only on period time, also We'll fail, because some web pages aren't updated every years but some are every minutes (if not seconds).
So what is better approach to solve this issue? (Thanks)
UPDATE
Another option is use of LastModified http-header! but this is strong approach?
Browsers do this automatically with the help of the several caching mechanisms that HTTP provides. The two mechanisms most obviously useful for determining whether a page has changed is the concept of Entity Tags and the Last Modified HTTP header. These mechanisms allow the browser to make conditional requests to a web site, eg. fetch a page only if it has been changed.
Quoting RFC 2616 on HTTP 1.1:
3.11 Entity Tags
Entity tags are used for comparing two or more entities from the same
requested resource. HTTP/1.1 uses entity tags in the ETag (section
14.19), If-Match (section 14.24), If-None-Match (section 14.26), and
If-Range (section 14.27) header fields. The definition of how they
are used and compared as cache validators is in section 13.3.3. An
entity tag consists of an opaque quoted string, possibly prefixed by
a weakness indicator.
The key point here is that the ETag is a cache validator. If a browser has a cached version of a page (called a resource in the RFC), it can use the ETag to determine whether the cached page is still valid, ie. if the page hasn't changed on the server.
And about the modification date:
14.25 If-Modified-Since
The If-Modified-Since request-header field is used with a method to
make it conditional: if the requested variant has not been modified
since the time specified in this field, an entity will not be
returned from the server; instead, a 304 (not modified) response will
be returned without any message-body.
The key point here is that the server may know when a page has been modified, and may then inform the client.
If you open a HTTP monitor (such as Fiddler for Windows) and watch your browser communicate with web sites, you'll see the use of these mechanisms first-hand when the browser makes conditional requests.
To specifically address your question about the Last Modified header, this header in itself won't work for the majority of pages you'll find. But in combination with the ETag it can get you started.

Resources