No expires header sent, content cached, how long until browser makes conditional GET request? - caching

Assume browser default settings, and content is sent without expires headers.
user visits website, browser caches images etc.
user does not close browser, or refresh page.
user continues to surf site normally.
assume the browse doesn't dump the cache for any reason.
The browser will cache images etc as the user surfs, but it's unclear when it will issue a conditional GET request to ask about content freshness (apart from refreshing the page). If this is a browser specific setting, where can I see it's value (for browsers like: safari, IE, FireFox, Chrome).
[edit: yes - I understand that you should always send expires headers. However, this research is aimed at understanding how the browser works with content w/o expires headers.]

From the the HTTP caching spec (section 13.4): Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation. This means that a user agent is free to do whatever it wants if no cache control header is sent. Most browsers use a combination of user settings and heuristics to determine whether (and how long) to cache in this situation.

HTTP/1.1 defines a selection of caching mechanisms; the expires header is merely one, there is also the cache-control header.
To directly answer your question: for a resource returned with no expires header, you must consider the returned cache-control directives.
HTTP/1.1 defines no caching behaviour for a resource served with no cache-related headers. If a resource is sent with no cache-control or expires headers you must assume the client will make a regular (non-conditional) request the next time the same resources is requested.
Any deviation from this behaviour qualifies the client as being not a fully conformant HTTP client, in which case the question becomes: what behaviour is to be expected from a non-conformant HTTP client? There is no way to answer that.
HTTP caching is complex, to fully understand what a conformant client should do in a given scenario, read and understand the HTTP caching spec.

Unless you send an expires header, most browsers will make a GET request for each subsequent refresh and will either get HTTP 200 OK (it will download the content again) or HTTP 304 Not Modified (and use the data in cache).

Related

Do browsers allow cross-domain requests to be "sent"?

I am newbie to website security and currently trying to understand Same-Origin-Policy in some depth.
While there are very good posts on stackoverflow and elsewhere about the concept of SOP, I could not find updated information on whether chrome and other browsers allow cross-domain XHR post requests to be 'sent' from the first place.
From this 5 year old post, it appears that chrome allows the request to pass through to the requested server but does not allow reading the response by the requester.
I tested that on my website trying to change user info on my server from a different domain. Details below:
My domain: "www.mysite.com"
Attacker domain: "www.attacker.mysite.com"
According to Same-Origin-Policy those two are considered different Origins.
User (while logged in to www.mysite.com) opens www.attacker.mysite.com and presses a button that fires a POST request to 'www.mysite.com' server...The submitted hidden form (without tokens in this case) has all the required information to change the user's info on 'www.mysite.com' server --> Result: CSRF successful attack: The user info does indeed change.
Now do the same but with javascript submitting the form through JQuery .post instead of submitting the form--> Result: Besides chrome giving the normal response:
No 'Access-Control-Allow-Origin' header is present on the requested
resource
, I found that no change is done on the server side...It seems that the request does not even pass through from the browser. The user info does not change at all! While that sounds good, I was expecting the opposite.
According to my understanding and the post linked above, for cross-domain requests, only the server response should be blocked by the browser not sending the post request to the server from the first place.
Also, I do not have any CORS configuration set; no Access-Control-Allow-Origin headers are sent. But even if I had that set, that should apply only on 'reading' the server response not actually sending the request...right?
I thought of preflights, where a request is sent to check if it's allowed on the server or not, and thus blocking the request before sending its actual data to change the user info. However, according to Access_Control_CORS , those preflights are only sent in specific situations which do not apply to my simple AJAX post request (which includes a simple form with enctype by default application/x-www-form-urlencoded and no custom headers are sent).
So is it that chrome has changed its security specs to prevent the post request to a cross domain from the first place?
or am I missing something here in my understanding to the same-origin-policy?
Either way, it would be helpful to know if there is a source for updated security measures implemented in different web browsers.
The XMLHttpRequest object behavior has been revisited with time.
The first AJAX request were unconstrained.
When SOP was introduced the XMLHttpRequest was updated to restrict every cross-origin request
If the origin of url is not same origin with the XMLHttpRequest origin the user agent should raise a SECURITY_ERR exception and terminate these steps.
From XMLHttpRequest Level 1, open method
The idea was that an AJAX request that couldn't read the response was useless and probably malicious, so they were forbidden.
So in general a cross-origin AJAX call would never make it to the server.
This API is now called XMLHttpRequest Level 1.
It turned out that SOP was in general too strict, before CORS was developed, Microsoft started to supply (and tried to standardize) a new XMLHttpRequest2 API that would allow only some specific requests, stripped by any cookie and most headers.
The standardization failed and was merged back into the XMLHttpRequest API after the advent of CORS. The behavior of Microsoft API was mostly retained but more complex (read: potentially dangerous) requests were allowed upon specific allowance from the server (through the use of pre-flights).
A POST request with non simple headers or Content-Type is considered complex, so it requires a pre-flight.
Pre-flights are done with the OPTIONS method and doesn't contain any form information, as such no updates on the server are done.
When the pre-flight fails, the user-agent (the browser) terminate the AJAX request, preserving the XMLHttpRequest Level 1 behavior.
So in short: For XMLHttpRequest the SOP was stronger, deny any cross-origin operations despite the goals stated by the SOP principles. This was possible because at the time that didn't break anything.
CORS loosened the policy allowing "non harmful" requests by default and allowing the negotiation of the others.
OK...I got it...It's neither a new policy in chrome nor missing something in SOP...
The session cookies of "www.mysite.com" were set to "HttpOnly" which means, as mentioned here, that they won't be sent along with AJAX requests, and thus the server won't change the user's details in point (4).
Once I added xhrFields: { withCredentials:true } to my post request, I was able to change the user's information in a cross-domain XHR POST call as expected.
Although this proves the already known fact that the browser actually sends the cross-domain post requests to the server and only blocks the server response, it might still be helpful to those trying to deepen their understanding to SOP and/or playing with CORS.

Same origin Policy and CORS (Cross-origin resource sharing)

I was trying to understand CORS. As per my understanding, it is a security mechanism implemented in browsers to avoid any AJAX request to domain other than the one open by the user (specified in the URL).
Now, due to this limitation many CORS was implemented to enable websites to do cross origin request. but as per my understanding implementing CORS defy the security purpose of the "Same Origin Policy" (SOP).
CORS is just to provide extra control over which request server wants to serve. Maybe it can avoid spammers.
From Wikipedia:
To initiate a cross-origin request, a browser sends the request with
an Origin HTTP header. The value of this header is the site that
served the page. For example, suppose a page on
http://www.social-network.example attempts to access a user's data
in online-personal-calendar.example. If the user's browser implements
CORS, the following request header would be sent:
Origin: http://www.social-network.example
If online-personal-calendar.example allows the request, it sends an
Access-Control-Allow-Origin header in its response. The value of the
header indicates what origin sites are allowed. For example, a
response to the previous request would contain the following:
Access-Control-Allow-Origin: http://www.social-network.example
If the server does not allow the cross-origin request, the browser
will deliver an error to social-network.example page instead of
the online-personal-calendar.example response.
To allow access to all pages, a server can send the following response
header:
Access-Control-Allow-Origin: *
However, this might not be appropriate for situations in which
security is a concern.
What am I missing here? what is the the intend of CORS to secure the server vs secure the client.
Same-origin policy
What is it?
The same-origin policy is a security measure standardized among browsers. The "origin" mostly refers to a "domain". It prevents different origins from interacting with each other, to prevent attacks such as Cross Site Request Forgery.
How does a CSRF attack work?
Browsers allow websites to store information on a client's computer, in the form of cookies. These cookies have some information attached to them, like the name of the cookie, when it was created, when it will expire, who set the cookie etc. A cookie looks something like this:
Cookie: cookiename=chocolate; Domain=.bakery.example; Path=/ [// ;otherDdata]
So this is a chocolate cookie, which should be accessible from http://bakery.example and all of its subdomains.
This cookie might contain some sensitive data. In this case, that data is... chocolate. Highly sensitive, as you can see.
So the browser stores this cookie. And whenever the user makes a request to a domain on which this cookie is accessible, the cookie would be sent to the server for that domain. Happy server.
This is a good thing. Super cool way for the server to store and retrieve information on and from the client-side.
But the problem is that this allows http://malicious-site.example to send those cookies to http://bakery.example, without the user knowing! For example, consider the following scenario:
# malicious-site.example/attackpage
var xhr = new XMLHttpRequest();
xhr.open('GET', 'http://bakery.example/order/new?deliveryAddress="address of malicious user"');
xhr.send();
If you visit the malicious site, and the above code executes, and same-origin policy was not there, the malicious user would place an order on behalf of you, and get the order at his place... and you might not like this.
This happened because your browser sent your chocolate cookie to http://bakery.example, which made http://bakery.example think that you are making the request for the new order, knowingly. But you aren't.
This is, in plain words, a CSRF attack. A forged request was made across sites. "Cross Site Request Forgery". And it would not work, thanks to the same-origin policy.
How does Same-origin policy solve this?
It stops the malicious-site.example from making requests to other domains. Simple.
In other words, the browser would not allow any site to make a request to any other site. It would prevent different origins from interacting with each other through such requests, like AJAX.
However, resource loading from other hosts like images, scripts, stylesheets, iframes, form submissions etc. are not subject to this limitation. We need another wall to protect our bakery from malicious site, by using CSRF Tokens.
CSRF Tokens
As stated, malicious site can still do something like this without violating the same-origin policy:
<img src='http://bakery.example/order/new?deliveryAddress="address of malicious user"'/>
And the browser will try to load an image from that URL, resulting in a GET request to that URL sending all the cookies. To stop this from happening, we need some server side protection.
Basically, we attach a random, unique token of suitable entropy to the user's session, store it on the server, and also send it to the client with the form. When the form is submitted, client sends that token along with the request, and server verifies if that token is valid or not.
Now that we have done this, and malicious website sends the request again, it will always fail since there is no feasible way for the malicious website to know the token for user's session.
CORS
When required, the policy can be circumvented, when cross site requests are required. This is known as CORS. Cross Origin Resource Sharing.
This works by having the "domains" tell the browser to chill, and allow such requests. This "telling" thing can be done by passing a header. Something like:
Access-Control-Allow-Origin: //comma separated allowed origins list, or just *
So if http://bakery.example passes this header to the browser, and the page creating the request to http://bakery.example is present in the origin list, then the browser will let the request go, along with the cookies.
There are rules according to which the origin is defined1. For example, different ports for the same domain are not the same origin. So the browser might decline this request if the ports are different. As always, our dear Internet Explorer is the exception to this. IE treats all ports the same way. This is non-standard and no other browser behaves this way. Do not rely on this.
JSONP
JSON with Padding is just a way to circumvent same-origin policy, when CORS is not an option. This is risky and a bad practice. Avoid using this.
What this technique involves is making a request to the other server like following:
<script src="http://badbakery.example/jsonpurl?callback=cake"></script>
Since same-origin policy does not prevent this2 request, the response of this request will be loaded into the page.
This URL would most probably respond with JSON content. But just including that JSON content on the page is not gonna help. It would result in an error, ofcourse. So http://badbakery.example accepts a callback parameter, and modifies the JSON data, sending it wrapped in whatever is passed to the callback parameter.
So instead of returning,
{ user: "vuln", acc: "B4D455" }
which is invalid JavaScript throwing an error, it would return,
cake({user: "vuln", acc:"B4D455"});
which is valid JavaScript, it would get executed, and probably get stored somewhere according to the cake function, so that the rest of the JavaScript on the page can use the data.
This is mostly used by APIs to send data to other domains. Again, this is a bad practice, can be risky, and should be strictly avoided.
Why is JSONP bad?
First of all, it is very much limited. You can't handle any errors if the request fails (at-least not in a sane way). You can't retry the request, etc.
It also requires you to have a cake function in the global scope which is not very good. May the cooks save you if you need to execute multiple JSONP requests with different callbacks. This is solved by temporary functions by various libraries but is still a hackish way of doing something hackish.
Finally, you are inserting random JavaScript code in the DOM. If you aren't 100% sure that the remote service will return safe cakes, you can't rely on this.
References
1. https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy#Definition_of_an_origin
2. https://www.w3.org/Security/wiki/Same_Origin_Policy#Details
Other worthy reads
http://scarybeastsecurity.blogspot.dk/2009/12/generic-cross-browser-cross-domain.html
https://www.rfc-editor.org/rfc/rfc3986 (sorry :p)
https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)
The Same Origin Policy (SOP) is the policy browsers implement to prevent vulnerabilities via Cross Site Scripting (XSS). This is mainly for protecting the server, as there are many occasions when a server can be dealing with authentication, cookies, sessions, etc.
The Cross Origin Resource Sharing (CORS) is one of the few techniques for relaxing the SOP. Because SOP is "on" by default, setting CORS at the server-side will allow a request to be sent to the server via an XMLHttpRequest even if the request was sent from a different domain. This becomes useful if your server was intended to serve requests from other domains (e.g. if you are providing an API).
I hope this clears up the distinction between SOP and CORS and the purposes of each.

Conditional GET Request and Expiry Header testing with Firebug-NET

I'm using Firebug's NET feature to measure the performance of our application. I'm a bit confused the way it is displaying the timeline. We have enabled Expiry header for all static files(it is 30 days from the current date). Now even if the resource is available in cache, it still makes a conditional GET (that is what I think). Ideally there shouldn't make a connection to the server, but it takes 93ms to create a connection. Please find the image that I've attached.
Can some one please help me to understand this better?
The HTTP response contains a header entry "Etag". ETag is a cache validator tag.
The HTTP Client on seeing this response will always verify with the server if the Content has been updated.
Cache Validator tag has higher preference over other Cache control tags.
If you want content to be served from cache without it being validate on the server side then only keep the Expires header and remove the ETag header.

Do XMLHttpRequests cache the response headers?

If I make an XMLHttpRequest and the browser caches the response, will it also cache the HTTP response headers? That is, the next time I make the same request, will I get the same values back from response.getResponseHeader?
Is this browser-dependent?
I'm certain at least the major browsers don't try to cache the headers. However, if you want to prevent caching altogether, you may need to send special headers. If you want to do a quick test of caching behaviors, there's a page here:
http://www.mnot.net/javascript/xmlhttprequest/cache.html
And I recommend if you want to see what's actually happening, that you go get a packet sniffer such as Wireshark and see for yourself. I can imagine the browser at least performs a HEAD request for an XmlHttpRequest even if it gives you the cached body, but I could be wrong.
Headers are either not cached or not reused. Since a request is sent, headers are received and only then can be decided if this request is valid in the cache. The new headers have then already been downloaded, so no point in reusing the old ones.
Only the response body is cached.
You could ofcourse very easily test this. Make a XHR request to a static resource (img or txt file or something) and check the Date header.
I don't think it's browser dependant. A browser caching and reusing HTTP headers would be very, very strange.
edit
jQuery adds (by default I think) anti caching arguments to GET requests (very annoying), which would sort-of answer your question: nothing is cached like that.

How to detect whether web page content is different from cached version

Hi guys
As you know checking process of web pages content is a little different from static pages or personal files on our machines because content of Dynamic web pages are changed on each request. So if we are going to use checksums to identifying changes, We'll fail! very simple example is when site owner are use Google Ads on him website; on each request Ads are different from previous. Also if we are going to cache only on period time, also We'll fail, because some web pages aren't updated every years but some are every minutes (if not seconds).
So what is better approach to solve this issue? (Thanks)
UPDATE
Another option is use of LastModified http-header! but this is strong approach?
Browsers do this automatically with the help of the several caching mechanisms that HTTP provides. The two mechanisms most obviously useful for determining whether a page has changed is the concept of Entity Tags and the Last Modified HTTP header. These mechanisms allow the browser to make conditional requests to a web site, eg. fetch a page only if it has been changed.
Quoting RFC 2616 on HTTP 1.1:
3.11 Entity Tags
Entity tags are used for comparing two or more entities from the same
requested resource. HTTP/1.1 uses entity tags in the ETag (section
14.19), If-Match (section 14.24), If-None-Match (section 14.26), and
If-Range (section 14.27) header fields. The definition of how they
are used and compared as cache validators is in section 13.3.3. An
entity tag consists of an opaque quoted string, possibly prefixed by
a weakness indicator.
The key point here is that the ETag is a cache validator. If a browser has a cached version of a page (called a resource in the RFC), it can use the ETag to determine whether the cached page is still valid, ie. if the page hasn't changed on the server.
And about the modification date:
14.25 If-Modified-Since
The If-Modified-Since request-header field is used with a method to
make it conditional: if the requested variant has not been modified
since the time specified in this field, an entity will not be
returned from the server; instead, a 304 (not modified) response will
be returned without any message-body.
The key point here is that the server may know when a page has been modified, and may then inform the client.
If you open a HTTP monitor (such as Fiddler for Windows) and watch your browser communicate with web sites, you'll see the use of these mechanisms first-hand when the browser makes conditional requests.
To specifically address your question about the Last Modified header, this header in itself won't work for the majority of pages you'll find. But in combination with the ETag it can get you started.

Resources