So I have been having a war with Google these last couple of days.
I created a sitemap (which is generated dynamicly) and routed to it using reverseproxy in nginx.
Happy with myself i uploaded the url too Google search console.
I got an error right away as seen in the picture. "Sitemap is HTML"
After digging around a while it appeared our pre-renderer had picked up the request and served google a pre-rendered version of the xml file, thus in html.
But even after fixing this, making sure no request for sitemap.xml goes to our pre-renderer google still, gives the same error message.
I have tried removing it and adding it again in search console multiple times on different days, i have tried waiting, I have tired serving it with another name (sitemap2.xml), and I have tried adding an actual xml file instead of the dynamic one. Nothing works!
I have verifed the xml file after i disabled the pre-renderer on multiple other sites and every one gives me an ok.
It's as though it's ignoring my requests to re-check the file.
Sitemap location: https://www.tirex.se/sitemap.xml
Any tips at this point would be much appriciated!
I have just started developing for a few weeks now and I bought a domain, but when I upload the files on live, the website looks different than what I have uploaded. Now, this gets fixed when I clear my cache. The problem is that my visitors enter, they see the page in a way, and after I update it they see it as the previous version!
Is there any possible solution for this? I don't want my visitors to clear cache every time I make a change on my website!
This is quite probable to be due to css cache. Your server is loading a cached version. You can specify the cached time in a few ways. Etags and htaccess (on apache) are the most common.
A very simple trick is just to add at the end of your style link url (where you load your main style in the head of the document) a get-like parameter: just like this:
main.css?v=2
It's very important for the website I'm working on to be offline-functional. I'm using a Cache Manifest to store all the files on the application cache, so that takes care of that and all is good and well.
BUT, as I read and noticed myself, the browser first shows the cached version of the site before checking for an update online. Hitting refresh reloads the cache again, with the new cached files this time (or what it had time to update for the swift refreshers).
I'm aware of this fix : http://www.html5rocks.com/en/tutorials/appcache/beginner/, where the user is told an update is available and is asked to refresh the page. Not a bad method, but still sketchy for user experience.
Is there any other way to force the browser to show the most up to date files if online? Would cache busting all files manually AND using a cache manifest fix this problem, or will it conflict with the cache manifest and cause problem to the offline functionality?
I found something that works well for me:
The URL linking to the web page contains a parameter. If there is ever a change to the page or related files, the url is changed to something like this: http:/ /www.mywebsite.com/mypage.html?v=3 where v=3 is changed depending on updates.
This is a longer fix to implement (finding every page affected by a change & changing all their cache busting links), but the pages at least show what they're supposed to on the first load and the cache manifest still load the update for offline viewing.
I've got a web service that, like most others, uses js and css files. I use the old trick of appending a version number to the js and css file like; ?v=123 and that gets changed every time we update the service on production.
Now, this works fine on all browsers, except for Chrome. Chrome seems to prefer it's cached version over getting the new one and therefor seems to ignore the appended variable. In some cases, forcing it to refresh cache (cmd+r / ctrl+f5) wasn't enough so I had to go into options and clear out the cache for it to load up the new content.
Has anyone experienced this issue with Chrome? And if so, what was the resolution to the problem?
Chrome should certainly treat requests with varying query strings as different requests; a cached result for style.css?v=123 should never be used for style.css?v=124. If you're seeing different behavior, please file a bug at http://new.crbug.com/ and post the bug ID here.
That said, I'd first check to see whether the page was cached longer than you expected. If a new version of the page itself wasn't downloaded, then it would still be requesting ?v=123 as the HTML wouldn't have changed. If you're sending long-lived cache headers with the page, it's certainly possible that Chrome is caching it more aggressively than you expected. If that's the behavior you're seeing, please star http://crbug.com/8742 for updates.
I had also same experience
You can user Ctrl + Shift + R for cache free browsing in both Chrome + Mozilla.
I have had this experience as well.
I run a membership site which displays content such as "You must be logged in as a Gold member in order to see this content" if they are not logged in or are trying to view content not allowed by their membership level. But even if the user is logged in, the user would still see "You need to log in", due to Google Chrome's aggressive caching. In Firefox, however, it works fine as I test logging in and out of all 5 levels of membership - each displaying the proper content.
While Chrome's caching problem can be solved by clearing the cache every time the user logs in and out, it would be really annoying to take that approach.
If I understand correctly, a broswer caches images, JS files, etc. based on the file name. So there's a danger that if one such file is updated (on the server), the browser will use the cached copy instead.
A workaround for this problem is to rename all files (as part of the build), such that the file name includes an MD5 hash of it's contents, e.g.
foo.js -> foo_AS577688BC87654.js
me.png -> me_32126A88BC3456BB.png
However, in addition to renaming the files themselves, all references to these files must be changed. For exmaple a tag such as <img src="me.png"/> should be changed to <img src="me_32126A88BC3456BB.png"/>.
Obviously this can get pretty complicated, particularly when you consider that references to these files may be dynamically created within server-side code.
Of course, one solution is to completely disable caching on the browser (and any caches between the server and the browser) using HTTP headers. However, having no caching will create it's own set of problems.
Is there a better solution?
Thanks,
Don
The best solution seems to be to version filenames by appending the last-modified time.
You can do it this way: add a rewrite rule to your Apache configuration, like so:
RewriteRule ^(.+)\.(.+)\.(js|css|jpg|png|gif)$ $1.$3
This will redirect any "versioned" URL to the "normal" one. The idea is to keep your filenames the same, but to benefit from cache. The solution to append a parameter to the URL will not be optimal with some proxies that don't cache URLs with parameters.
Then, instead of writing:
<img src="image.png" />
Just call a PHP function:
<img src="<?php versionFile('image.png'); ?>" />
With versionFile() looking like this:
function versionFile($file){
$path = pathinfo($file);
$ver = '.'.filemtime($_SERVER['DOCUMENT_ROOT'].$file).'.';
echo $path['dirname'].'/'.str_replace('.', $ver, $path['basename']);
}
And that's it! The browser will ask for image.123456789.png, Apache will redirect this to image.png, so you will benefit from cache in all cases and won't have any out-of-date issue, while not having to bother with filename versioning.
You can see a detailed explanation of this technique here: http://particletree.com/notebook/automatically-version-your-css-and-javascript-files/
Why not just add a querystring "version" number and update the version each time?
foo.js -> foo.js?version=5
There still is a bit of work during the build to update the version numbers but filenames don't need to change.
Renaming your resources is the way to go, although we use a build number and embed that in to the file name instead of an MD5 hash
foo.js -> foo.123.js
as it means that all your resources can be renamed in a deterministic fashion and resolved at runtime.
We then use custom controls to generate links to resources at on page load based upon the build number which is stored in an app setting.
We followed a similar pattern to PJP, using Rails and Nginx.
We wanted user avatar images to be browser cached, but on an avatar's change we needed the cache to be invalidated ASAP.
We added a method to the avatar model to append a timestamp to the file name:
return "/images/#{sourcedir}/#{user.login}-#{self.updated_at.to_s(:flat_string)}.png"
In all places in the code where avatars were used, we referenced this method rather than an URL. In the Nginx configuration, we added this rewrite:
rewrite "^/images/avatars/(.+)-[\d]{12}.png" /images/avatars/$1.png;
rewrite "^/images/small-avatars/(.+)-[\d]{12}.png" /images/small-avatars/$1.png;
This meant if a file changed, its URL in the HTML changed, so the user's browser made a new request for the file. When the request reached Nginx, it got rewritten to the simple name of the file.
I would suggest using caching by ETags in this situation, see http://en.wikipedia.org/wiki/HTTP_ETag. You can then use the hash as the etag. A request will still be submitted for each resource, but the browser will only download items that have changed since last download.
Read up on your web server / platform docs on how to use etags properly, most decent platforms have built-in support.
Most modern browsers check the if-modified-since header whenever a cacheable resource is in a HTTP request. However, not all browsers support the if-modified-since header.
There are three ways to "force" the browser to load a cached resource.
Option 1 Create a query string with a version#. src="script.js?ver=21". The downside is many proxy servers wont cache a resource with query strings. It also requires site-wide updating for changes.
Option 2 Create a naming system for your files src="script083010.js". However the downside to option 1 is that this as well requires site-wide updates whenever a file changes.
Option 3 Perhaps the most elegant solution, simply set up the caching headers: last-modified and expires in your server. The main downside to this is users may have to recache resources because they expired yet never changed. Additionally, the last-modified header does not work well when content is being served from multiple servers.
Here a few resources to check out: Yahoo Google AskApache.com
This is really only an issue if your web server sets a far-future "Expires" header (setting something like ExpiresDefault "access plus 10 years" in your Apache config). Otherwise, a browser will make a conditional GET, based on the modified time and/or the Etag. You can verify what is happening on your site by using a web proxy or an extension like Firebug (on the Net panel). Your question doesn't mention how your web server is configured, and what headers it is sending with static files.
If you're not setting a far-future Expires header, there's nothing special you need to do. Your web server will usually handle conditional GETs for static files based on last modified time just fine. If you are setting a far-future Expires header then yes, you need to add some sort of version to the file name like your question and the other answers have mentioned already.
I have also been thinking about this for a site I support where it would be a big job to change all references. I have two ideas:
1.
Set distant cache expiry headers and apply the changes you suggest for the most commonly downloaded files. For other files set the headers so they expire after a very short time - eg. 10 minutes. Then if you have a 10 minute downtime when updating the application, caches will be refreshed by the time users go to the site. General site navigation should be improved as the files will only need downloading every 10 minutes not every click.
2.
Each time a new version of the application is deployed to a different context that contains the version number. eg. www.site.com/app_2_6_0/ I'm not really sure about this as users bookmarks would be broken on each update.
I believe that a combination of solutions works best:
Setting cache expiry dates for each type of resource (image, page, etc) appropreatly for that resource, for example:
Your static "About", "Contact" etc pages probably arn't going to change more than a few time a year, so you could easily put a cache time of a month on these pages.
Images used in these pages could have eternal cache times, as you are more likey to replace an image then to change one.
Avatar images might have an expiry time of a day.
Some resources need modified dates in their names. For example avatars, generated images, and the like.
Some things should never be caches, new pages, user content etc. In these cases you should cache on the server, but never on the client side.
In the end you need to carfully consider each type of resource to determine what cache time to instruct the browser to use, and always be conservitive if you are unsure. You can increase the time later, but it's much more pain to uncache something.
You might want to check out the approach taken by the grails "uiperformance" plugin, which you can find here. It does a lot of the things you mention, but automates them (set expiry time to a long time, then increments version numbers when files change).
So if you're using grails, you get this stuff for free. If you are not - maybe you can borrow the techniques employed.
Also - borrowed form the ui-performance page, - read the following 14 rules.
ETags seemingly provide a solution for this...
As per http://httpd.apache.org/docs/2.0/mod/core.html#fileetag, we can set the browser to generate ETags on file-size (instead of time/inode/etc). This generation should be constant across multiple server deployments.
Just enable it in (/etc/apache2/apache2.conf)
FileETag Size
& you should be good!
That way, you can simply reference your images as <img src='/path/to/foo.png' /> and still use all the goodness of HTTP caching.