Safari doesn't cache resources across different domains - caching

Let’s say we have several different websites: website1.com, website2.com, website3.com. We use jQuery on all of them and include it from CDN like googleapis.com. The expected behavior from a browser would be to cache it once and use it for all other websites. Chrome seems to do it, but Safari downloads jQuery for every domain.
Example
With the given JS code below open nytimes.com, bbc.com and dw.de in Chrome.
Append jQuery on the first website and look at the Network tab of your DevTools. It will say that it got jQuery.
Now open any other website and append jQuery again — the answer will be “from cache”.
However, Safari will say it’s loading jQuery for every domain, but try to open any webpage on one of the domains and append the script again — you will see that now it says it got jQuery from cache. So it looks like it caches data for a domain, even if it has already downloaded a resource from the exact URL for another domain.
Is this assumption correct and if so, how to fix it?
Code you can copy/paste:
setTimeout(function() {
var SCRIPT_SRC = '//ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js';
var s = document.createElement('script');
s.type = 'text/javascript';
s.async = true;
s.src = SCRIPT_SRC;
var x = document.getElementsByTagName('script')[0];
x.parentNode.insertBefore(s, x);
}, 0);
UPD: Tested it with a static image.
test.com, test2.com and test3.com have <img src="http://image.com/image.jpg" />. In all browsers except for Safari access log shows only one — first — request for the image. Safari gets the image for every new domain (but not a subdomain).

I've noticed this too, and I suspect it is for privacy reasons.
By default, Safari blocks third-party cookies. A third party cookie is a cookie set on b.com on for a resource that is requested by a.com. This can be used, for example, to track people across domains. You can have a script on b.com that is requested by a.com and by c.com. b.com can insert a unique client ID into this script based on a third-party cookie, so that a.com and c.com can track that this is the same person.
Safari blocks this behavior. If b.com sets a cookie for a resource requested by a.com, Safari will box that cookie so it is only sent to b.com for more requests by a.com. It will not be sent to b.com for requests by c.com.
Now enter caching and specifically the Etag header. An Etag is an arbitrary string (usually a hash of the file) that can be used to determine if the requested resource has changed since the person requested it last. This is normally a good thing. It saves re-sending the entire file if it is has not changed.
However, because an Etag is an arbitrary string, b.com can set it to include a client ID. This is called Etag tracking. It allows tracking a person across domains in almost exactly the same way as cookies do.
Summary: By not sharing the cache across domains, Safari protects you from cross-domain Etag tracking.

This is by design, something the Safari team call Intelligent Tracking Protection - https://webkit.org/blog/7675/intelligent-tracking-prevention/ - and the cache is double-keyed based on document origin and third-party origin
Based on research using HTTP Archive data and the Yahoo / Facebook studies on cache-lifetimes I doubt shared caching of jQuery etc is effective - not enough sites use the same versions of the libraries, and the libraries don't live in cache for very long – so Safari's behaviour helps prevent tracking, while not really affecting performance

Rather than simply adding a DOM element, you could try using XMLHTTPRequest. It lets you define custom headers -- one of which is Cache-Control.
Give this a shot, it should override whatever's going on at the browser level:
(function () {
var newRequest = function() {
return (window.XMLHttpRequest) ? new XMLHttpRequest() : new ActiveXObject( 'MsXml2.XmlHttp' );
}
var loadScript = function(url) {
var http = new newRequest();
http.onReadyStateChange = function() {
if (http.readyState === 4) {
if (http.status === 200 || http.status === 304) {
appendToPage(http.responseText);
}
}
}
// This is where you set your cache
http.setRequestHeader( 'Cache-Control', 'max-age=0' )// <-- change this to a value larger than 0
http.open('GET', url, true);
http.send(null);
}
var appendToPage = function(source) {
if (source === null) return false;
var head = document.getElementsByTagName('head')[0];
var script = document.createElement('script');
script.language = 'javascript';
script.type = 'text/javascript';
script.defer = true;
script.text = source;
head.appendChild(script);
}
loadScript( '//ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js' );
})();
Note: Safari has had some issues with caching in the past. However, from what I understand it was mostly about serving stale content -- not the other way around.

Here are some suggestions :
Have you checked if "disable cache" option is disabled ?
Are you looking for HTTP status code in the network dev panel ?
Have you tried capturing traffic with tools like WireShark ?
Best regards.

Related

Direct (and simple!) AJAX upload to AWS S3 from (AngularJS) Single Page App

I know there's been a lot of coverage on upload to AWS S3. However, I've been struggling with this for about 24 hours now and I have not found any answer that fits my situation.
What I'm trying to do
Upload a file to AWS S3 directly from my client to my S3 bucket. The situation is:
It's a Single Page App, so upload request must be in AJAX
My server and my client are not on the same domain
The S3 bucket is of the newest sort (Frankfurt), for which some signature-generating libraries don't work (see below)
Client is in AngularJS
Server is in ExpressJS
What I've tried
Heroku's article on direct upload to S3. Doesn't fit my client/server configuration (plus it really does not fit harmoniously with Angular)
ready-made directives like ng-s3upload. Does not work because their signature-generating algorithm is not accepted by recent s3 buckets.
Manually creating a file upload directive and logic on the client like in this article (using FormData and Angular's $http). It consisted of getting a signed URL from AWS on the server (and that part worked), then AJAX-uploading to that URL. It failed with some mysterious CORS-related message (although I did set a CORS config on Heroku)
It seems I'm facing 2 difficulties: having a file input that works in my Single Page App, and getting AWS's workflow right.
The kind of solution I'm looking for
If possible, I'd like to avoid 'all included' solutions that manage the whole process while hiding of all of the complexity, making it hard to adapt to special cases. I'd much rather have a simple explanation breaking down the flow of data between the various components involved, even if it requires some more plumbing from me.
I finally managed. The key points were:
Let go of Angular's $http, and use native XMLHttpRequest instead.
Use the getSignedUrl feature of AWS's SDK, instead on implementing my own signature-generating workflow like many libraries do.
Set the AWS configuration to use the proper signature version (v4 at the time of writing) and region ('eu-central-1' in the case of Frankfurt).
Here below is a step-by-step guide of what I did; it uses AngularJS and NodeJS on the server, but should be rather easy to adapt to other stacks, especially because it deals with the most pathological cases (SPA on a different domain that the server, with a bucket in a recent - at the time of writing - region).
Workflow summary
The user selects a file in the browser; your JavaScript keeps a reference to it.
the client sends a request to your server to obtain a signed upload URL.
Your server chooses a name for the object to put in the bucket (make sure to avoid name collisions!).
The server obtains a signed URL for your object using the AWS SDK, and sends it back to the client. This involves the object's name and the AWS credentials.
Given the file and the signed URL, the client sends a PUT request directly to your S3 Bucket.
Before you start
Make sure that:
Your server has the AWS SDK
Your server has AWS credentials with proper access rights to your bucket
Your S3 bucket has a proper CORS configuration for your client.
Step 1: setup a SPA-friendly file upload form / widget.
All that matters is to have a workflow that eventually gives you programmatic access to a File object - without uploading it.
In my case, I used the ng-file-select and ng-file-drop directives of the excellent angular-file-upload library. But there are other ways of doing it (see this post for example.).
Note that you can access useful information in your file object such as file.name, file.type etc.
Step 2: Get a signed URL for the file on your server
On your server, you can use the AWS SDK to obtain a secure, temporary URL to PUT your file from someplace else (like your frontend).
In NodeJS, I did it this way:
// ---------------------------------
// some initial configuration
var aws = require('aws-sdk');
aws.config.update({
accessKeyId: process.env.AWS_ACCESS_KEY,
secretAccessKey: process.env.AWS_SECRET_KEY,
signatureVersion: 'v4',
region: 'eu-central-1'
});
// ---------------------------------
// now say you want fetch a URL for an object named `objectName`
var s3 = new aws.S3();
var s3_params = {
Bucket: MY_BUCKET_NAME,
Key: objectName,
Expires: 60,
ACL: 'public-read'
};
s3.getSignedUrl('putObject', s3_params, function (err, signedUrl) {
// send signedUrl back to client
// [...]
});
You'll probably want to know the URL to GET your object to (typically if it's an image). To do this, I simply removed the query string from the URL:
var url = require('url');
// ...
var parsedUrl = url.parse(signedUrl);
parsedUrl.search = null;
var objectUrl = url.format(parsedUrl);
Step 3: send the PUT request from the client
Now that your client has your File object and the signed URL, it can send the PUT request to S3. My advice in Angular's case is to just use XMLHttpRequest instead of the $http service:
var signedUrl, file;
// ...
var d_completed = $q.defer(); // since I'm working with Angular, I use $q for asynchronous control flow, but it's not mandatory
var xhr = new XMLHttpRequest();
xhr.file = file; // not necessary if you create scopes like this
xhr.onreadystatechange = function(e) {
if ( 4 == this.readyState ) {
// done uploading! HURRAY!
d_completed.resolve(true);
}
};
xhr.open('PUT', signedUrl, true);
xhr.setRequestHeader("Content-Type","application/octet-stream");
xhr.send(file);
Acknowledgements
I would like to thank emil10001 and Will Webberley, whose publications were very valuable to me for this issue.
You can use the ng-file-upload $upload.http method in conjunction with the aws-sdk getSignedUrl to accomplish this. After you get the signedUrl back from your server, this is the client code:
var fileReader = new FileReader();
fileReader.readAsArrayBuffer(file);
fileReader.onload = function(e) {
$upload.http({
method: 'PUT',
headers: {'Content-Type': file.type != '' ? file.type : 'application/octet-stream'},
url: signedUrl,
data: e.target.result
}).progress(function (evt) {
var progressPercentage = parseInt(100.0 * evt.loaded / evt.total);
console.log('progress: ' + progressPercentage + '% ' + file.name);
}).success(function (data, status, headers, config) {
console.log('file ' + file.name + 'uploaded. Response: ' + data);
});
To do multipart uploads, or those larger than 5 GB, this process gets a bit more complicated, as each part needs its own signature. Conveniently, there is a JS library for that:
https://github.com/TTLabs/EvaporateJS
via https://github.com/aws/aws-sdk-js/issues/468
Use s3-file-upload open source directive having dynamic data-binding and auto-callback functions - https://github.com/vinayvnvv/s3FileUpload

Refused to set unsafe header Connection/Content-length

I'm working on a website and I have a problem right here. On the page I'm working, the user puts an ip address and the ports he wants to be searched. This is being made with ajax (user side) and php (server side). Ajax sends the ip and port (one by one) to the php file, and he returns the result of the port. The goal is that user sees what's the port is being tested (in a div element) at the moment, and here is where the problem is. He runs/works well, he tests all the ports the user wants to, but during the test period he shows no port, just shows the final port (after all previous ports have been tested) and the result of the ports (if some port had a result) which appears in a distinct div element. This just works perfectly in Firefox, in other browsers happens what I just explained. The Google Chrome console says: Refused to set unsafe header "Content-length" and Refused to set unsafe header "Connection". I've been searching about this problem for days and I found so many things and I tried them, but none of them solved the problem.
Here is my code.
jquery.js
function HttpRequest(endereco, portainicio)
{
var xmlhttp;
var params = "endereco="+endereco+"&"+"porta="+portainicio;
if (window.XMLHttpRequest) // IE7+, Firefox, Chrome, Opera, Safari
{
xmlhttp = new XMLHttpRequest();
}
else // IE6, IE5
{
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("POST", "/firewall/ajax", false);
//alert(params);
xmlhttp.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
xmlhttp.setRequestHeader("Content-length", params.length);
xmlhttp.setRequestHeader("Connection", "close");
xmlhttp.send(params);
return xmlhttp.responseText;
}
function ajaxfirewall()
{
(...)
var resposta;
$("p.ip").append("<span class='end'> "+endereco+"</span>");
for (portainicio; portainicio <= portafinal; portainicio++)
{
resposta = HttpRequest(endereco, portainicio);
$("p.porta").append(" <span class='tporta'>"+ resposta+"</span><br>");
}
return false;
}
Another thing it's really strange. Do you see those alert(params); which are commented in the HttpRequest function? If I leave it uncommented it displays the port which is being tested, but it shows the alert and I don't want that.
Without the HTML your jquery.js is supposed to work on this involves some guesswork (maybe you could post the relevant excerpt (Hint, hint)). I would consider it possible that $("p.porta") cannot be found or that the appended HTML reacts in an unexpected way. You should try to just print your results to console using e.g. console.log (that is you are using Firebug or some such) in order to see what you get at what time. Maybe you will find something on the client side too.
Update
Judging from this question and its accepted answer the Chrome behavior is actually what you should expect. The standard for XMLHttpRequests prescribes that these two headers should not be set by the client in order to avoid request smuggling attacks. You just should not set them (even if your PHP source tells you to).

Best way to get definitions out of Google?

I'm trying to make a simple feature where a user can specify a term and the program fetches a definition for it and returns it. The best definition system I know of is Google's "define" keyword in search queries where if you start the query with "define " or "define:" etc it returns very accurate and sufficient definitions. However, I have no idea how to access this information programatically.
Google's new Custom Search Engine API doesn't show definitions and the old one gives slightly better results but is deprecated and still doesn't show the same definitions I see when I Google the term in the browser.
Failing Google, I turned to Wikipedia, which has a huge API but I still couldn't find a way to extract summaries like Google definitions.
So my question is, does anybody know how I can get this information out of Google via the API or any other means?
This is an older question but is asking the same thing. Except the answers given are no longer applicable as Google Dictionary no longer exists.
Update: So I'm now going down the route of trying to scrape the definitions straight out of the page itself. Now the problem is, when I visit the page in the browser (Firefox), the definitions show up, but when I'm scraping them using cheerio, they don't show up anywhere on the page. I must mention I'm scraping the page through nitrous.io so it's rendering the page from a different region and operating system to the one I'm viewing it in the browser with so maybe it's region related. Will look into it further.
Update 2.0: I think maybe the definitions are loaded asynchronously and so I have no idea how to scrape them because I've never really done scraping before and I'm just a newbie :(
Update 3.0: Ok, so now I think it's not to do with the asynchronous loading but the renderer of the page. When I load this in Firefox, the page looks like this:
However, when I load it in IE (8) it looks like this:
Anybody got some insight on this?
Finally got to the answer. Had to set user agent when screen scraping. My resulting code for getting definitions via scraping:
var request = require('request')
, cheerio = require('cheerio');
var searchTerm = 'test';
request({url:'https://www.google.co.uk/search?q=define+'+searchTerm,headers:{"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0"}}, function(err, resp, body){
$ = cheerio.load(body);
var defineBlocks = $(".lr_dct_sf_sen");
var numOfBlocks = (defineBlocks.length < 3) ? defineBlocks.length : 3;
for (var i=0; i<numOfBlocks; i++){
var block = defineBlocks[i].children[1].children[0]; //font-size:small level
process(block);
function process (block) {
for (var i=0; i<block.children.length; i++){
var line = block.children[i];
if ("style" in line.attribs){ // main text
exampleStr = "";
for (var k=0; k<line.children.length; k++){
exampleStr += line.children[k].children[0].data;
}
console.log(exampleStr);
} else if ("class" in line.attribs){ // example
console.log("\""+line.children[1].children[0].data+"\"");
} else { // nothing i want
}
}
}
}
});

Windows Phone webclient caching "issue"?

I am trying to call the same link, but with different values, the issue is that the url is correct containing the new values but when I download it (Webclient.DownloadStringTaskAsync), it gives me the previous calls result.
I have tried adding headers no-cache, and attaching a random value to the call, and ifmodifiedSince header. however it is still not working.
any help will be much appreciated cause I have tried everything.
uri: + "&junk=" + Guid.NewGuid());
client.Headers["Cache-Control"] = "no-cache";
client.Headers[HttpRequestHeader.IfModifiedSince] = DateTime.UtcNow.ToString();
var accessdes = await client.DownloadStringTaskAsync(uri3);
so here my uri3 contains the latest values, but when I hover over accessdes, it contains the result as if I am making a old uri3 call with previous set data.
I saw one friend that was attaching a random GUID to the Url in order to prevent the OS to cache its content. For example:
if the Url were: http://www.ms.com/getdatetime and the OS is caching it.
Our solution was adding a guid for creating "sort of" like a new url, as an example our previous Url would look like: http://www.ms.com/getdatetime?cachebuster=21EC2020-3AEA-4069-A2DD-08002B30309D
(see more about cache buster : http://www.adopsinsider.com/ad-ops-basics/what-is-a-cache-buster-and-how-does-it-work/ )

MooTools AJAX Request on unload

i'm trying to lock a row in a db-table when a user is editing the entry.
So there's a field in the table lockthat I set 1 on page load with php.
Then I was trying to unlock the entry (set it 0) when the page is unloaded.
This is my approach. It works fine in IE but not in Firefox, Chrome etc....
The window.onbeforeunload works in all browsers, I tested that.
They just don't do the Request
BUT
if I simple put an alert after req.send(); it works in some browsers but not safari or chrome. So I tried putting something else after it just so that's there's other stuff to do after the request but it doesn't work.
function test() {
var req = new Request({
url: 'inc/ajax/unlock_table.php?unlock_table=regswimmer&unlock_id=',
});
req.send();
alert('bla'); // ONLY WORKS WITH THIS !?!?!?
}
window.onbeforeunload = test;
i've already tried different ways to do the request but nothing seems to work. And the request itself works, just not in this constellation.
ANY help would be appreciated!
Thanks
the request is asynchronous by default. this means it will fork it and not care of the complete, which may or may not come (have time to finish). by placing the alert there you ensure that there is sufficient time for the request to complete.
basically, you may be better off trying one of these things:
add async: false to the request object options. this will ensure the request's completion before moving away.
use an image instead like a tracking pixel.
move over to method: "get" which is a bit faster as it does not contain extra headers and cookie info, may complete better (revert to this if async is delayed too much)
you can do the image like so (will also be $_GET)
new Element("img", {
src: "inc/ajax/unlock_table.php?unlock_table=regswimmer&unlock_id=" + someid + "&seed=" + $random(0, 100000),
styles: {
display: "none"
}
}).inject(document.body);
finally, use window.addEvent("beforeunload", test); or you may mess up mootools' internal garbage collection

Resources