Application level caching in AEM - caching

We are working on a site in AEM 6.1 which has news and events content with most pages having info on recent and related news/events based on tagging that are dynamic. We are using dispatcher. Please suggest on some caching techniques that could be implemented at application level apart from the dispatcher. Thanks.

Aim of implementing the caching on dispatcher is to allow less hits on your app server and serve as much as possible from web server. In short improving response time from your app server. But in some cases we can't cache too much on web server if results change on app server frequently.
On app server we can have following solutions implemented to get results quickly on top of having dispatcher in place.
Make sure your content hierarchy where you are ingesting news items have as less number of article as possible. Divide your hierarchy based on following structure. Year >> Month >> Day >> Hour (this can be ignored if content flow is less) >> news items.
Having this structure in place, write path based query so that you don't have to traverse in whole content hierarchy.
There is a concept of transient node in CQ, for each news item which is getting created in CQ, update the transient node with newly created item. Means for recent news you don't have to traverse content structure just refer to transient node which has reference to newly created news item.
You could also write a cron job which gets executed in background and takes care of collating views namely top recent news.

To complement the answer of Rupesh I would say that definitely use dispatcher cache as much as you can and for using local caching strategies in AEM try using guava cache it is a very good and easy to use tool there is also a lot of information on how you can set it up and use it for your specific needs. Hope it helps.

I would suggest the following:
For recent news/events, write a scheduler
(https://sling.apache.org/documentation/bundles/scheduler-service-commons-scheduler.html) that will compute the list of recent news/events and write it to a specific node as properties, example:
/tmp/recent
news [/path/to/news1,/path/to/news2]
events [/path/to/event1,/path/to/event2]
Most recent always at the end of the array. Your code need to limit it to the amount of max recent you want to have.
Let's say you want to have the last 5 changed and a 6th page is changed, then you just pop and push(new_page_path)
This could run once a day or at the frequency which you feel is the best depending on your requirements.
If you need instant update, then you can additionally write a Listener when a page is changed/deleted and update the recent list. In this case I would suggest putting the code that deal with updating the recent list into a service and use that service in both the scheduler and the listener.
Listener and scheduler need to run on both author and publisher and on publisher trigger dispatcher cache invalidation for /tmp/recent afterwards.
In order to render the recent list without having to invalidate the whole pages, I would suggest you use SSI for that, that means have a component in your page that will render an SSI include to /tmp/recent.news.html or /tmp/recent.events.html depending on whether you want to render recent news or events.
Give the node /tmp/recent the resourceType for handling the "news" and "events" selector and implement that resourceType to render the content.
For the related
Use the Tag Manager (https://docs.adobe.com/docs/en/cq/5-6-1/javadoc/com/day/cq/tagging/TagManager.html) "find" method to lookup for all news/events having the same tag as the current page. I assume your news and events pages have a dedicated template or resource type.
Also I would suggest having a dedicated component that would include that content using SSI include. Let's say your page has 2 tags, ns/tag1 and ns/tag2, then you could perform the SSI include like this:
SSI include /etc/tags/ns/tag1.related_news.html
SSI include /etc/tags/ns/tag1.related_events.html
SSI include /etc/tags/ns/tag2.related_news.html
SSI include /etc/tags/ns/tag2.related_events.html
depending on what you want to include
Write a component under /apps/cq/tagging/components/tag (sling:resourceSuperType= /libs/cq/tagging/components/tag) that will provide the rendering for the "related_news" and "related_events" selector and list all related pages.
The advantage with this approach is that you can share the related page for each tag and whenever the tag is changed/deleted then the cache gets invalidated automatically.
In both cases (recent and related) configure the dispatcher to cache the output.

Related

REST API for main page - one JSON or many?

I'm providing RESTful API to my (JS) client from (Java Spring) server.
Main site page contains a number of logical blocks (news, last comments, some trending stuff), each of them has a corresponding entity on server. Which way is a right one to go, handle one request like
/api/main_page/ ->
{
news: {...}
comments: {...}
...
}
or let the client do a few requests like
/api/news/
/api/comments/
...
I know in general it's better to have one large request/response, but is this an answer to this situation as well?
Ideally, you should have different API calls for fetching individual configurable content blocks of the page from the same API.
This way your content blocks are loosely bounded to each other.
You
can extend, port(to a new framework) and modify them independently at
anytime you want.
This comes extremely useful when application grows.
Switching off a feature is fairly easy in this
case.
A/B testing is also easy in this case.
Writing automation is
also very easy.
Overall it helps in reducing the testing efforts.
But if you really want to fetch this in one call. Then you should add additional params in request and when the server sees that additional param it adds the additional independent JSON in the response by calling it's own method from BL layer.
And, if speed is your concern then try caching these calls on server for some time(depends on the type of application).
I think in general multiple requests can be justified, when the requested resources reflect parts of the system state. (my personal rule of thumb, still WIP).
i.e. if a news gets displayed in your client application a lot, I would request it once and reuse it wherever I can. If you aggregate here, you would need to request for it later, maybe some of them never get actually displayed, and you have some magic to do if the representation of a news differs in the aggregation and /news/{id}-resource.
This approach would increase communication if the page gets loaded for the first time, but decrease communication throughout your client application the longer it runs.
The state on the server gets copied request by request to your client or updated when needed (Etags, last-modified, etc.).
In your example it looks like /news and /comments are some sort of latest or since last visit, but not all.
If this is true, I would design them to be a resurce as well, like /comments/latest or similar.
But in any case I would them only have self-links to the /news/{id} or /comments/{id} respectively. Then you would have a request to /comments/latest, what results in a list of news-self-links, for what I would start a request only if I don't already have that news (maybe I want to check if the cached copy is still up to date).
It is also possible to trigger the request to a /news/{id} only if it gets actually displayed (scrolling, swiping).
Probably the lifespan of a news or a comment is a criterion to answer this question. Meaning the caching in the client it is not that vital to the system, in opposite of a book in an Book store app.

How to handle data composition and retrieval with dependencies in Flux?

I'm trying to figure out what is the best way to handle a quite commons situation in medium complex apps using Flux architecture, how to retrieve data from the server when the models that compose the data have dependencies between them. For example:
An shop web app, has the following models:
Carts (the user can have multiple carts)
Vendors
Products
For each of the models there is an Store associated (CartsStore, VendorsStore, ProductsStore).
Assuming there are too many products and vendors to keep them always loaded, my problem comes when I want to show the list of carts.
I have a hierarchy of React.js components:
CartList.jsx
Cart.jsx
CartItem.jsx
The CartList component is the one who retrieves all the data from the Stores and creates the list of Cart components passing the specific dependencies for each of them. (Carts, Vendors, Products)
Now, if I knew beforehand which products and vendors I needed I would just launch all three requests to the server and use waitFor in the Stores to synch the data if needed. The problem is that until I get the carts and I don't know which vendors or products I need to request to the server.
My current solution is to handle this in the CartList component, in getState I get the Carts, Vendors and Products from each of the Stores, and on _onChange I do the whole flow:
This works for now, but there a few things I don't like:
1) The flow seems a bit brittle to me, specially because the component is listening to 3 stores but there is only entry point to trigger "something has changed in the data event", so I'm not able to distinguish what exactly has changed and react properly.
2) When the component is triggering some of the nested dependencies, it cannot create any action, because is in the _onChange method, which is considering as still handling the previous action. Flux doesn't like that and triggers an "Cannot dispatch in the middle of a dispatch.", which means that I cannot trigger any action until the whole process is finished.
3) Because of the only entry point is quite tricky to react to errors.
So, an alternative solution I'm thinking about is to have the "model composition" logic in the call to the API, having a wrapper model (CartList) that contains all 3 models needed, and storing that on a Store, which would only be notified when the whole object is assembled. The problem with that is to react to changes in one of the sub models coming from outside.
Has anyone figured out a nice way to handle data composition situations?
Not sure if it's possible in your application, or the right way, but I had a similar scenario and we ended up doing a pseudo implementation of Relay/GraphQL that basically gives you the whole tree on each request. If there's lots of data, it can be hard, but we just figured out the dependencies etc on the server side, and then returned it in a nice hierarchical format so the React components had everything they needed up to the level where the call came from.
Like I said, depending on details this might not be feasible, but we found it a lot easier to sort out these dependencies server-side with stuff like SQL/Java available rather than, like you mentioned, making lots of async calls and messing with the stores.

In Angularjs, can I $apply / render a single scope at a time?

I have a fairly complex app, with lots of different components that update frequently. For example, a clock.
Is it possible to call $apply / $digest on only a subsection of the page at once? I don't want to call every watcher on the page for every single clock tick, for example.
I know I can achieve this by bypassing $scope.$apply entirely, and just updating the clock elements manually in a directive. Is there any hope for me?
EDIT: Actually, it looks like MAYBE what I want is to dun $digest, starting on the scope I want to check, rather than $apply, since $apply kicks off the digest on $rootScope. Is this a valid way to do it?
http://plnkr.co/edit/C8aOswf46qx2GoD5uL9Y?p=preview
If your components are really decoupled, you could isolate those that generate frequent updates in their own angular app instance. They will have independent digest cycles.
Your apps can still communicate but there is a bit more overhead involved.
In order to have 2 apps, you have to manually start the applications (use bootstrap instead of ng-app).
See this example: http://plnkr.co/edit/K3bnACFi79g5Kh0kFS66?p=preview
Whenever you call $scope.$apply() it also calls $apply() on all scopes that fall within that scope. If you want to call $apply() on a limited section of a page then that section needs to have it's own scope, which you can do by adding a controller to that section of the page. Then you can use that controller to update the scopes within that section of the page using $scope.apply() on your section controller.
-- Edit --
See comments below for additional details about the differences between $apply and $digest.
Also see:
http://jimhoskins.com/2012/12/17/angularjs-and-apply.html
https://groups.google.com/forum/#!topic/angular/SSj61VOBBSc

Wicket and complex Ajax scenarios

When a screen has multiple interacting Ajax controls and you want to control the visibility of components to react to these controls (so that you only display what makes sense in any given situation), calling target.addComponent() manually on everything you want to update is getting cumbersome and isn't very maintainable.
Eventually the web of onClick and onUpdate callbacks can reach a point where adding a new component to the screen is getting much harder than it's supposed to be.
What are the commonly used strategies (or even libraries if such a thing exists) to avoid this build-up of complexity?
Update: Thank you for your answers, I found all of them very useful, but I can only accept one. Sorry.
In Wicket 1.5 there is an event bus. Each component has onEvent(Object payload) method. With component.send() you can broadcast events and each component can check the payload (e.g. UserJoinedEvent object) and decide whether it wants to participate in the current Ajax response. See http://www.wicket-library.com/wicket-examples/events/ for a simple demo.
You could add structural components such as WebMarkupContainers, when you add this to the AjaxTarget everything contained in it will also get updated. This allows you to update groups of components in a single line.
When I'm creating components for a page I tend to add them to component arrays:
Component[] pageComponents = {
new TextField<String>("Field1"),
new TextField<String>("Field2"),
new TextField<String>("Field3")
}
As of Wicket 1.5 the add functions take array parameters [1]. Therefore elements can be added to the page or target like this:
add(pageComponents);
target.add(pageComponents);
Components can then be grouped based on which you want to refresh together.
[1] http://www.jarvana.com/jarvana/view/org/apache/wicket/wicket/1.5-M3/wicket-1.5-M3-javadoc.jar!/org/apache/wicket/ajax/AjaxRequestTarget.html
Well, of how many components do we speak here? Ten? Twenty? Hundreds?
For up to twenty or about this you can have a state controller which controls which components should be shown. This controller sets the visible field of a components model and you do always add all components to your requests which are handled by the controller. The components ajax events you simply redirect to the controller handle method.
For really large numbers of components which have a too heavy payload for a good performance you could use javascript libraries like jQuery to do the show and hide things by the client.
I currently use some sort of modified Observer-Pattern to simulate an event-bus in Wicket 1.4.
My Pages act as an observable observer since my components don't know each other and are reused in different combinations across multiple pages. Whenever one component receives an Ajax-event that could affect other components as well, it calls a method on it's page with an event-object and the ajax-target. The page calls a similar method on all components which have registered themselves for this kind of event and each component can decide, on the base of the supplied event-object if and how it has to react and can attach itself to the target.
The same can be archived by using the wicket visitor. I don't know which one is better, but I think that's mainly a matter of taste.

Storing, Loading, and Updating a Trie in ASP.NET MVC 3

I have a trie-based word detection algorithm for a custom dictionary. Note that regular expressions are too brittle with this dictionary as entries may contain spaces, periods, etc.
I've implemented the algorithm in a local C# app that reads in the dictionary from file and stores the trie in memory (it's compact, so no RAM size issues at all). Now I would like to use this algorithm in an MVC 3 app on a cloud host like AppHarbor, with the added twist that I want a web interface to enable adding/editing words.
It's fast enough that loading the dictionary from file and building the trie every time a user uploads their text would not be an issue (< 1s on my laptop). However, if I want to enable admins to edit the dictionary via the web interface, that would seem tricky since the dictionary would potentially be getting updated while a user is trying to upload text for analysis.
What is the best strategy for storing, loading, and updating the trie in an MVC 3 app?
I'm not sure if you are looking for specific implementation details, or more conceptual ideas about how to handle but I'll throw some ideas out there for now.
Actual Trie Classes - Here is a good C# example of classes for setting up a Trie. It sounds like you already have this part figured out.
Storing: I would persist the trie data to XML unless you are already using a database and have some need to have it in a dbms. The XML will be simple to work with in the MVC application and you don't need to worry about database connectivity issues, or the added cost of a database. I would also have two versions of the trie data on the server, a production copy and a production support copy, the second for which your admin can perform transactions against.
Loading In your admin module of the application, you may implement a feature for loading the trie data into memory, the frequency of data loading depends on your application needs. It could be scheduled or available as a manual function. Like in wordpress sites, if a user should access it while updating they would receive a message that the site is undergoing maintenance. You may choose to load into memory on demand only, and keep the trie loaded at all times except for if problems occurred.
Updating - I'd have a second database (or XML file) that is used for applying updates. The method of applying updates to production would depend partially on the frequency, quantity, and time of updates. One safe method might be to store transactions entered by the admin.
For example:
trie.put("John", 112);
trie.put("Doe", 222);
trie.Remove("John");
Then apply these transactions to your production data as needed via an admin function. If needed put your site into "maint" mode. If the updates are few and fast you may be able to code the site so that it will hold all work until transactions are processed, a user might have to wait a few milliseconds longer for a result but you wouldn't have to worry about mutating data issues.
This is pretty vague but just throwing some ideas out there... if you provide comments I'll try to give more.
1 Store trie in cache:
It is not dynamic data, and caching helps us in other tasks (like concurrency access to trie by admin and user)
2 Make access to cache clear:
:
public class TrieHelper
{
public Trie MyTrie
{
get
{
if (HttpContext.Current.Cache["myTrieKey"] == null)
HttpContext.Current.Cache["myTrieKey"] = LoadTrieFromFile(); //Returns Trie object
return (Trie)HttpContext.Current.Cache["myTrieKey"];
}
}
3 Lock trie object while adding operation in progress
public void AddWordToTrie(string word)
{
var trie = MyTrie;
lock (HttpContext.Current.Cache["myTrieKey"])
{
trie.AddWord(word);
} // notify that trie object locking when write data to file is not reuired
WriteNewWordToTrieFile(word); // should lock FileWriter object
}
}
4 If editing is performs by 1 admin at a time - store trie in xml file - it will be easy to implement logic of search element, after what word your word should be added (you can create function, that will use MyTrie object in memory), and add it, using linq to xml.
I've got a kind'a the same but 10 times bigger :)
The client design it's own calendar with questions ans possible answer in the meanwhile some is online and being used by the normal user.
What I come up was something as test and deploy. The Admin enters the calendar values and set it up correctly and after he can use a Preview button to see if it's like he needs/wants, then, to make the changes valid to all end users, he need to push Deploy.
He, as an ADMIN, will know that, until he pushes the DEPLOY button, all users accessing the Calendar will have the old values. Soon he hits deploy all is set in the Database, and pushed the files he uploaded into Amazon S3 (for faster access).
I update the Cache with the new calendar and the new Calendar object is cached until the App pool says otherwise or he hit the Deploy button again.
You could do something like this.
As you are going to perform your application in the cloud environment, I'd suggest you to take a look at CQRS and durable messaging and provide some concurrency model (possibly, optimistic concurrency and intelligent conflict detection http://skillsmatter.com/podcast/design-architecture/cqrs-not-just-for-server-systems 5:00)
Also, obviously, you need to analyze your business requirements more precisely because, as Udi Dahan mentioned, race conditions are result of the lack of business analysis.

Resources