Does jsoup.connect().get() return cached Document? - caching

I use jsoup and following code to get the HTML content of a website Document doc = Jsoup.connect(this.getUrl()).get();.
Does I get a cached version of the website? Is it possible to request a non-cached version? I knew I could set a header request. Something like:
header("Cache-control", "no-cache");
header("Cache-store", "no-store");
But I’m not sure if that works. I just knew that these tags are used for the client browser.
It would be awesome if someone could clarify. Greetings.

Any headers that you correctly (HTTP spec) specify will be sent to target host via java.net.URLConnection.addRequestProperty(String, String). You should get a cached version of the page if server supports this header, end-to-end. jSoup just supplies the headers as the request it made and when I looked through the source, it does not make any explicit effort to cache off the response content.

Related

Force chrome to cache static content like images

I want to improve my experience with the internet by caching static content like images(jpg,png,gif) and fonts. Because always happens that when watching a webpage with a lot of images, and then I refresh with F5, the same contents are downloaded again.
I know that it's because the response headers could contain no cache o max-age 0, and even sometimes it happens when there is no cache o max-age in the response.
But in case of images or fonts that never change, it's useless to get max-age 0. So I wanted to know if there is a way to override the response headers and set them with max-age 1 year. Maybe with a chrome extension?
Yes you can do this by using a Chrome Extension. See this Change HTTP Headers chrome extension already does it.
For your specific case, you just need to do this:
Add an event listener which should be called whenever headers are received.
Read the details of the headers
Check if response content type is image
Add/Update the desired header to the headers
To accomplish this you can use webRequest Headers Received event.
Documentation of onHeadersReceived
onHeadersReceived (optionally synchronous): Fires each time that an HTTP(S) response header is received. Due to redirects and authentication requests this can happen multiple times per request. This event is intended to allow extensions to add, modify, and delete response headers, such as incoming Set-Cookie headers.
Your code will look something like this
chrome.webRequest.onHeadersReceived.addListener(function(details){
for(var i = 0; i < details.responseHeaders.length; i++) {
// If response is of image, add the cache-control header
}
return {responseHeaders: details.responseHeaders};
},
{urls: ['https://*/*'], types: ['image'] },
['blocking', 'responseHeaders']);
PS: I have not run and tested the code so please excuse the typos.
EDIT (After #RobW comment)
No, this is not possible as of now (22 march 2014). Adding Cache-control has no influence on the caching behavior. Check out this answer for more details.
Albeit this is an old question, I stumbled upon it recently. Later I found the chrome extension "Speed-Up Browsing" which seems to do exactly what the OP asked for.
For me it worked.
https://chrome.google.com/webstore/detail/speed-up-browsing/hkhnldpdljhiooeallkmlajnogjghdfb?hl=en
You could use Fiddle https://www.telerik.com/fiddler (free) as proxy adding caching for selected URLs or patterns. I did that a

Disable cache in ExtLib REST control (which uses dojox.data.JsonRestStore)

In my XPage I have a xe:djxDataGrid (dojox.grid.datagrid) which uses xe:restService which seems to use dojox.data.JsonRestStore.
Everything works fine without proxy but my client accesses the application via a proxy because of corporate policy. After a user updates data in the DataGrid it shows old values when accessed behind the proxy.
When the REST Control/JsonRestStore sends an ajax GET request to get data, there is no Cache-Control parameter in request headers. And Domino does not place Expires parameter in the reponse headers. I believe that's why the old version of the GET request gets cached by the proxy.
We have tried to disable cache in browsers but that does not help which indicates the proxy is caching the requests.
I believe this could be solved either by:
Setting Cache-Control parameter in request headers OR
Setting Expires parameter in response headers
But I haven't found a way to set either of these. For the XPage Domino sets Expires:-1 response header but not for the ajax GET request which is:
/mypage.xsp/?$$viewid=!ddrg6o7q1z!&$$axtarget=view:_id1:_id2:callback1:restService1
This returns the JSON data to JsonRestStore and gets cached by the proxy.
One options is to try to get an exception to the proxy so requests to this site would bypass the proxy cache. But exceptions are generally not easy to get thru.
Any ideas? Thanks.
Update1
My colleque suggested that I could intercept the xhr GET requests made by dojox.data.JsonRestStore and add a time parameter to the URL to prevent cache. Here is my question about that:
Prevent cache in every Dojo xhr request on page
Update2
#SvenHasselbach has a great solution for preventing cache for all xhrs:
http://openntf.org/XSnippets.nsf/snippet.xsp?id=cache-prevention-for-dojo-xhr-requests
It seems to work perfectly, &dojo.preventCache= parameter is added to the URLs and the requests seem to return correct JSON also with this parameter. But the DataGrid stops working when I use that code. Every xhr causes this error:
Tried with Firefox and Chrome. The first page of data still loads because xhr interception is not yet in place but the subsequent pages show only "..." in each cell.
The solution is Sven Hasselbach's code in the comment section of Julian Buss's blog which needs to be slightly modified.
I changed xhrPost to xhrGet and did not place the code to dojo.addOnLoad. When placed there it was not effective in the first XHR by the DataGrid/Store.
I also removed the headers modification because it overrides existing headers. When the REST control requests data from server with xhrGet the URL is always the same and rows requested are in HTTP header like this:
Range: items=0-9
This (and other) headers disappear when the original code is used. To just add headers we would have take the existing headers from args and append to them. I didn't see a need for that because it should be enough to add the parameter in the URL. Here is the extremely simple code I'm using:
if( !(dojo._xhrGet )) {
dojo._xhrGet = dojo.xhrGet;
}
dojo.xhrGet = function (args) {
args['preventCache'] = true;
return dojo._xhrGet(args);
}
Now I'm getting all rows and all XHR Get URLs have &dojo.preventCache= parameter which is exactly what I wanted. Next we'll test in customer environment to see if this solves their problem.
Update
As Julian points out in his blog I could also use a Web Site Rule to set Expires or cache-control http response headers.
Update
The customer reports it's working now for them!

Serving content depending on http accept header - caching problems?

I'm developing an application which is supposed to serve different content for "normal" browser requests and AJAX requests for the same URL requested.
(in fact, encapsulate the response HTML in JSON object if the request is AJAX).
For this purpose, I'm detecting an AJAX request on the server side, and processing the response appropriately, see the pseudocode below:
function process_response(request, response)
{
if request.is_ajax
{
response.headers['Content-Type'] = 'application/json';
response.headers['Cache-Control'] = 'no-cache';
response.content = JSON( some_data... )
}
}
The problem is that when the first AJAX request to the currently viewed URL is made strange things happens on Google Chrome - if, right after the response comes and is processed via JavaScript, user clicks some link (static, which redirects to other page) and then clicks back button in the browser, he sees the returned JSON code instead of the rendered website (logging the server I can say that no request is made). It seems for me that Chrome stores the latest request response for the specific URL, and doesn't take into account that it has different content-type etc.
Is that a bug in the Chrome or am I misusing HTTP protocol ?
--- update 12 11 2012, 12:38 UTC
following PatrikAkerstrand answer, I've found following Chrome bug: http://code.google.com/p/chromium/issues/detail?id=94369
any ideas how to avoid this behaviour?
You should also include a Vary-header:
response.headers['Vary'] = 'Content-Type'
Vary is a standard way to control caching context in content negotiation. Unfortunately it has also buggy implementations in some browsers, see Browser cache vary broken.
I would suggest using unique URLs.
Depending of you framework capabilities you can redirect (302) the browser to URL + .html to force response format and make cache key unique within browser session. Then for AJAX requests you can still keep suffix-less URL. Alternatively you may suffix AJAX URL with .json instead .
Another options are: prefixing AJAX requests with /api or adding some cache boosting query params ?rand=1234.
Setting cache-control to no-store made it in my case, while no-cache didn't. This may have unwanted side effects though.
no-store: The response may not be stored in any cache. Although other directives may be set, this alone is the only directive you need in preventing cached responses on modern browsers.
Source: Mozilla Developer Network - HTTP Cache-Control

Should I be using POST or GET when retrieving JSON data into jqGrid in my ASP.NET MVC application?

I am using jqgrid in my ASP.NET MVC application. Currently I have mTYpe: 'POST' like this:
jQuery("#myGrid").jqGrid({
mtype: 'POST',
toppager: true,
footerrow: haveFooter,
userDataOnFooter: haveFooter,
But I was reading this article, and I see this paragraph:
Browsers can cache images, JavaScript, CSS files on a user's hard
drive, and it can also cache XML HTTP calls if the call is a HTTP GET.
The cache is based on the URL. If it's the same URL, and it's cached
on the computer, then the response is loaded from the cache, not from
the server when it is requested again. Basically, the browser can
cache any HTTP GET call and return cached data based on the URL. If
you make an XML HTTP call as HTTP GET and the server returns some
special header which informs the browser to cache the response, on
future calls, the response will be immediately returned from the cache
and thus saves the delay of network roundtrip and download time.
Given this is the case, should I switch my jqGrid mType all to use "GET" from "POST" for the mType? (It says XML (doesn't mention JSON). If the answer is yes, then actually what would be a situation why I would ever want to use POST for jqGrid mType as it seems to do the same thing without this caching benefit?
The problem which you describe could be in Internet Explorer, but it will be not exist in jqGrid if you use default options.
If you look at the full URL which will be used you will see parameters like
nd=1339350870256
It has the same meaning as cache: true of jQuery.ajax. jqGrid add the current timestemp to the URL to make it unique.
I personally like to use HTTP GET in jqGrid, but I don't like the usage of nd parameter. The reason I described in the old answer. It would be better to use prmNames: {nd:null} option of jqGrid which remove the usage of nd parameter in the URL. Instead of that one can control the caching on the server side. For example the setting of
Cache-Control: private, max-age=0
is my standard setting. To set the HTTP header you need just include the following line in the code of ASP.NET MVC action
HttpContext.Current.Response.Cache.SetMaxAge (new TimeSpan (0));
You can find more details in the answer.
It's important to understand, that the header Cache-Control: private, max-age=0 don't prevent the caching of data, but the data will be never used without re-validation on the server. Using other HTTP header option ETag you can make the revalidate really working. The main idea, that the value of ETag will be always changed on changing the data on the server. In the case if the previous data are already in the web browser cache the web browser automatically send If-None-Match part in the HTTP request with the value of ETag from the cached data. So if the server see that the data are not changed it can answer with HTTP response having 304 Not Modified status and empty body of the HTTP response. It allows the web browser to use local previously cached data.
In the answer and in this one you will find the code example how to use ETag approach.
If the data that the server sends changes, then you should use POST to avoid getting cached data everytime you request it.
You should not use GET for all the purposes. GET requests are supposed to use for getting data from the server not for saving or deleting operation. GET requests has some limitation since the data you are sending to the server or appended as query-strings you can't send very large data using GET requests. Also you should not use GET request to send sensitive information to the server. You should the POST request in all the other cases like adding, editing and deleting.
As far as I'm aware jqgrid appends a unique key in every GET request so you don't get any benefit from browser caching.
One way around the caching behavior is to make the GET unique each time the request is made. jQuery.ajax() does this with "cache: false" by appending a timestamp to the end of the request. You can replicate this behavior with something similar:
uri = uri + '?_=' + (new Date()).getTime(); // uri represents the URI to the endpoint

VBScript: Disable caching of response from server to HTTP GET URL request

I want to turn off the cache used when a URL call to a server is made from VBScript running within an application on a Windows machine. What function/method/object do I use to do this?
When the call is made for the first time, my Linux based Apache server returns a response back from the CGI Perl script that it is running. However, subsequent runs of the script seem to be using the same response as for the first time, so the data is being cached somewhere. My server logs confirm that the server is not being called in those subsequent times, only in the first time.
This is what I am doing. I am using the following code from within a commercial application (don't wish to mention this application, probably not relevant to my problem):
With CreateObject("MSXML2.XMLHTTP")
.open "GET", "http://myserver/cgi-bin/nsr/nsr.cgi?aparam=1", False
.send
nsrresponse =.responseText
End With
Is there a function/method on the above object to turn off caching, or should I be calling a method/function to turn off the caching on a response object before making the URL?
I looked here for a solution: http://msdn.microsoft.com/en-us/library/ms535874(VS.85).aspx - not quite helpful enough. And here: http://www.w3.org/TR/XMLHttpRequest/ - very unfriendly and hard to read.
I am also trying to force not using the cache using http header settings and html document header meta data:
Snippet of server-side Perl CGI script that returns the response back to the calling client, set expiry to 0.
print $httpGetCGIRequest->header(
-type => 'text/html',
-expires => '+0s',
);
Http header settings in response sent back to client:
<html><head><meta http-equiv="CACHE-CONTROL" content="NO-CACHE"></head>
<body>
response message generated from server
</body>
</html>
The above http header and html document head settings haven't worked, hence my question.
I don't think that the XMLHTTP object itself does even implement caching.
You send a fresh request as soon as you call .send() on it. The whole point of caching is to avoid sending requests, but that does not happen here (as far as your code sample goes).
But if the object is used in a browser of some sort, then the browser may implement caching. In this case the common approach is to include a cache-breaker into the statement: a random URL parameter you change every time you make a new request (like, appending the current time to the URL).
Alternatively, you can make your server send a Cache-Control: no-cache, no-store HTTP-header and see if that helps.
The <meta http-equiv="CACHE-CONTROL" content="NO-CACHE> is probably useless and you can drop it entirely.
You could use WinHTTP, which does not cache HTTP responses. You should still add the cache control directive (Cache-control: no-cache) using the SetRequestHeader method, because it instructs intermediate proxies and servers not to return a previously cached response.
If you have control over the application targeted by the XMLHTTP Request (which is true in your case), you could let it send no-cache headers in the Response. This solved the issue in my case.
Response.AppendHeader("pragma", "no-cache");
Response.AppendHeader("Cache-Control", "no-cache, no-store");
As alternative, you could also append a querystring containing a random number to each requested url.

Resources