Play Framework: caching on templates - caching

I'm using Play Framework (v1.1.1) and I have a doubt about the #{cache} tag.
I suppose the question would be "when should I use it?" but I think it's quite generic.
So besides that, I would like to know if someone has checked its behaviour with Javascript. I understand that it will cache the output of other tags embedded in its body, but it will also cache Javascript? More specifically, if I include some script tags that reference external resources (like a CDN), the file will get cached too or only the tag?

The purpose of the Cache tag is to cache the output that the server sends to the client. Javascript, images and any other information that is contained within the code sent to the client side is not cached, unless specifically told to do so by the headers set in the tag of your HTML.
By default, Play (if you extend the main.html) does not specify any cache-control headers, so therefore your scripts will be cached based on the browsers standard caching policy. This should be "no-cache" according to the http spec, but I am doubtful of whether this is the case.

Related

Is output of debugClientLibs flag cached in CQ/AEM

I've been using debugClientLibs flag in my AEM pages, (helpful for debugging clientlibs related issues) like this localhost:4502/content/geometrixx/en.html?debugClientLibs=true.
Recently, I was seeing some caching related issue of JS. I noticed, when using debugClientLibs flag, no-cache header was not included in Request Header of individual JS files.
It does not make sense to cache these individual files as they would defeat the purpose of debugging clientlibs(i would not want to see the cached JS and CSS files when i'm using debugClientLibs flag in my pages). Attaching a screentshot of Request and Response headers i got.
My Question here is:
Are these individual clientlib files cached on the browser ?
Short answer - it depends.
Every browser has its own implementation for networking & caching rules. Response headers are hints to the browser to help them be more efficient. But the browsers may chose to do its own thing. Even more distracting, the behavior may change between versions of a given browser. Further, even if the browser's default behavior is to follow (or ignore) such headers, the user may configure different behavior. So don't assume anything, especially globally.

How can I exclude pages created from a specific template from the CQ5 dispatcher cache?

I have a specific Adobe CQ5 (5.5) content template that authors will use to create pages. I want to exclude any page that is created from this template from the dispatcher cache. As I understand it currently, the only way I know to prevent caching is to configure dispatcher.any to not cache a particular URL. But in this case, the URL isn't known until a web author uses the template to create a page. I don't want to have to go back and modify dispatcher.any every time a page is created--or at least I want to automate this if there is no other way. I am using IIS for the dispatcher. The reason I don't want to cache the pages is because the underlying JSPs that render the content for these pages produce dynamic content, and the pages don't use querystrings and won't carry authentication headers. The pages will be created in unpredictable directories, so I don't know the URL pattern ahead of time.
How can I configure things so that any page that is created from a certain template will be automatically excluded from the dispatcher cache?
It seems like CQ ought to have some mechanism to respect HTTP response/caching headers. If the HTTP response headers specify that the response shouldn't be cached, it seems like the dispatcher shouldn't cache it--regardless of what dispatcher.any says. This is the CQ5 documentation I have been referencing.
I don't know about the IIS verson of the Dispatcher, but certainly with the Apache module if you add a custom HTTP header "dispatcher: nocache" it will not cache the page in the Dispatcher. You would need to change the code to add this, which would be something like:
request.setHeader("Dispatcher", "nocache");
It might also work as meta tags in the html, but I've not tried this.
This is documented here: http://dev.day.com/content/kb/home/Dispatcher/faq-s/DispatcherNoCache.html
You might use cache control tags in the template's head. See info on PRAGMA and Cache-Control meta tags here: HTTP Cache- Control.

Is it still necessary to check if a visitor's browser supports Ajax?

If I have Ajax code on my website:
1) Is it still necessary to check if a user's browser supports Ajax? Don't they all?
2) if so, is there an non-ActiveX approach to check this? I'd really like to avoid ActiveX as some security filters could flag it as potential malware.
It's not a question of if browsers support 'ajax', this is just a pithy term used to describe the process of a client retreiving data from a server via Javascript, typically asynchronously, and typically using the XMLHttpRequest object.
This object is not defined by IE 5-6, so you have to write code to compensate, or use a library such as jQuery which encapsulates this.
So, what you should be asking is whether your site gracefully degrades if Javascript is not available on the client. i.e. can users still get to the content without Javascript?

Lazy HTTP caching

I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.

Google Search optimisation for ajax calls

I have a page on my site which has a list of things which gets updated frequently. This list is created by calling the server via jsonp, getting json back and transforming it into html. Fast and slick.
Unfortunately, Google isn't able to index it. After reading up on how to get this done according to Google's AJAX crawling guide, I am bit confused and need some clarification and confirmation:
The ajax pages need to be implement the rules only, right?
I currently have a rest url like
[site]/base/junkets/browse.aspx?page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
this would need to become something like:
[site]/base/junkets/browse.aspx#page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
And when google calls it like this
[site]/base/junkets/browse.aspx#!page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
I would have to deliver the html snapshot.
Why replace the ? with # ?
Creating html snapshots seems very cumbersome. Would it suffice to just serve simple links? In my case I would be happy if google would only index the things pages.
It looks like you've misunderstood the AJAX crawling guide. The #! notation is to be used on links to the page your AJAX application lives within, not on the URL of the service your appliction makes calls to. For example, if I access your app by going to example.com/app/, then you'd make page crawlable by instead linking to example.com/app/#!page=1.
Now when Googlebot sees that URL in a link, instead of going to example.com/app/#!page=1 – which means issuing a request for example.com/app/ (recall that the hash is never sent to the server) – it will request example.com/app/?_escaped_fragment_=page=1. If _escaped_fragment_ is present in a request, you know to return the static HTML version of your content.
Why is all of this necessary? Googlebot does not execute script (nor does it know how to index your JSON objects), so it has no way of knowing what ends up in front of your users after your scripts run and content is loaded. So, your server has to do the heavy lifting of producing a HTML version of what your users ultimately see in the AJAXy version.
So what are your next steps?
First, either change the links pointing to your application to include #!page=1 (or whatever), or add <meta name="fragment" content="!"> to your app's HTML. (See item 3 of the AJAX crawling guide.)
When the user changes pages (if this is applicable), you should also update the hash to reflect the current page. You could simply set location.hash='#!page=n';, but I'd recommend using the excellent jQuery BBQ plugin to help you manage the page's hash. (This way, you can listen to changes to the hash if the user manually changes it in the address bar.) Caveat: the currently released version of BBQ (1.2.1) does not support AJAX crawlable URLs, but the most recent version in the Git master (1.3pre) does, so you'll need to grab it here. Then, just set the AJAX crawlable option:
$.param.fragment.ajaxCrawlable(true);
Second, you'll have to add some server-side logic to example.com/app/ to detect the presence of _escaped_fragment_ in the query string, and return a static HTML version of the page if it's there. This is where Google's guidance on creating HTML snapshots might be helpful. It sounds like you might want to pursue option 3. You could also modify your service to output HTML in addition to JSON.
I've more or less given up on this. There really seems no alternative to generating the html on the server and delivering it in the html bdoy if you want goolge to index your directory.
I even tried adding a section wraped a .net user control which implemented a simple html version of the directory. But google also managed to ignore ..
So in the end my directory has been de-ajaxified. :(

Resources