Hippo cms cache method - caching

What kind of cache Hippo cms is using?
I found something about bundle cache on official page, but I don't have idea where information are stored and how get them out.
Main problem is I need synchronize L2cache between a lot of instances of Hippo cms application.

Hippo CMS has caching implemented at several different levels of the overall architecture. It's important to keep in mind that the Hippo stack has 3 main components:
The content repository (backed by an RDBMS)
The Hippo delivery tier (HST)
The Hippo CMS UI
My assumption is that you're probably trying to do this for the delivery part of the application. Let's take a bottom-up approach and see what different types of caching are available.
The content repository caches raw data from the persistence layer in the Bundle Cache. This is an in-memory cache provided by Apache Jackrabbit and its size can be configured in the repository.xml. You can already consider this an L2 Cache if you would compare this to for instance Hibernate, since it's shared by all JCR sessions. You can find more information on how to tune this cache on the corresponding documentation page.
The Hippo delivery tier uses a couple of in-memory model representations derived from repository configuration and four different caches: Binaries Cache, WebFiles Cache, Node Cache and Page Cache.
The Binaries Cache is used to cache static resources that are stored in the content repository, such as PDFs and images. It can be configured to be served from memory or disk. The binaries cache is using Ehcache in the background. More information on how to configure or specify your own cache can be found here.
The WebFiles Cache is used to also cache static resources from the repository, but instead of typically user content like PDFs and images, it is more developer static webapp content.
The Node Cache caches the contents of individual JCR nodes retrieved from the content repository. This includes actual content items (documents) as well as configuration stored in the repository such as web page layouts and URL mappings. The observation pattern is used to invalidate node cache entries when the original node in the repository is modified.
The Page Cache caches complete aggregated pages. This means that web pages can be served directly from cache without any content retrieval and page aggregation as long as no modifications are made to the content and configuration making up the page. Once any of these is modified the delivery tier is notified through observation and invalidates the cached page. Requests arriving while the modified page is being aggregated can be served a stale cache entry to keep response times low. More information can be found here. The page cache is based on ehcache and can be configured through Spring configuration.
As you can see there are several levels of caching available. You have to keep in mind though that if you have multiple Hippo instances running, the caches will be automatically invalidated on all individual instances because of how clustering and cache invalidation works within Hippo, so in the end you might not even need to introduce additional caching.
You can find more information on the performance documentation page.

Related

What are the size limits for Laravel's file-based caching?

I am a new developer and am trying to implement Laravel's (5.1) caching facility to improve the speed of my app. I started out caching a large DB table that my app constantly references - but it got too large so I have backed away from that and am now 'forever' caching smaller chunks of data - for example, for each page only the portions of that large DB table that are relevant.
I have watched 'Caching Essentials' on Laracasts, done some Googling and had a search in this forum (and Laracasts') but I still have a couple of questions:
I am not totally clear on how the cache size limits work when you are using Laravel's file-based system - is there an overall in-app size limit for the cache or is one limited size-wise only per key and by your server size?
What are the signs you should switch from file-based caching to something like Memcached or Redis - and what are the benefits of using one of those services? Is it the fact that your caching is handled on a different server (thereby lightening the load on your own)? Do you switch over to one of these services when your local, file-based cache gets too big for your server?
My app utilizes several tables that have 3,000-4,000 rows - the data in these tables is constantly referenced and will remain static unless I decide to add new options. I am basically looking for the best way to speed up queries to the data in these tables.
Thanks!
I don't think Laravel imposes any limitations on its file i/o at all - the limitations will be with how much what PHP can read / write to a file at once, or hold in its memory / process at any one time.
It does serialise the data that you cache, and unserialise it when you reload it, so your PHP environment would have to be able to process the entire cache file (which is equivalent to the top level cache key) at once. So, if you are getting cacheduser.firstname, it would have to load the whole cacheduser key from the file, unserialise it, then get the firstname key from that.
I would take the PHP memory limit (classic, i know!) as a first point to investigate if you want to keep down this road.
Caching services like Redis or memcached are bespoke, optimised caching solutions. They take some of the logic and responsibility out of your PHP environment.
They can, for example, retrieve sub-keys from items without having to process the whole thing, so can retrieve part of some cached data in a memory efficient way. So, when you request cacheduser.firstname from redis, it just returns you the firstname attribute.
They have other advantages regarding tagging / clearing out subsets of caches (see [the cache tags Laravel docs] (https://laravel.com/docs/5.4/cache#cache-tags))
Another thing to think about is scaling. If your site is large enough, and is load-balanced across multiple servers, the filesystem caching may be different across those servers, as each server can only check their local filesystem for the cache files. A caching service can be on a different server (many hosts will have a separate redis / memcached services available), so isn't victim to this issue.
Also - as I understand it (and this might be the most important thing), the file cache driver in Laravel is mainly for local development and testing. Although it can work fine for simple applications with basic caching needs, it's not intended for large scalable production environments.
Personally, I develop locally and test with file caching, as i'm only dealing with small amounts of data then, and use redis to cache on production environments.
It doesn't necessarily need to be on a separate server to get the benefits. If you are never going to scale to multiple application servers, then using a caching service on the same server will already be a large improvement to caching large documents.

what are some caches that are responsible for fetching the data on miss?

The book 'architecture of open source software' says that the most common type of global cache in a web application is responsible for fetching the data itself, in case it is missing, as shown on this fixure. This seems different than what I've encountered so far. Most applications I have encountered make the application server responsible for fetching data from the db, and updating the server. At first, I thought the book might be talking about caching proxies, like Varnish, but they cover those in the next section, so that doesn't seem to be the case.
What cache systems actually fetch the data in case of a miss, and how do they know how to interact with the database?
Caching solutions provide read-through/write behind features which enable users to configure a read-through/write-behind provider be implementing some interface and deploying it with cache server. These providers contain logic about how cache server can interact with database to load/save data in database.
On a cache fetch operation if data is not present in cache server, cache loads data from database using configured provider thus avoiding a cache miss.
This way client applications deal cache as only data source and cache itself is responsible for interactions with database. You can read further details in this article by Iqbal Khan.
NCache and TayzGrid are enterprise solutions among many others that provide this feature.

Caching all entities in Cache Layer and Synchronizing with Database

Is it possible, reliable and secure to cache all entities in distrubuted cache and notifies dao layer on update? My possible idea is;
Use JPA 2.1 and Hibernate implementation.
On creation persist it db
After persisting it, cache it to distrubuted cache.
Canalise all read actions to cache
on update notify dao layer to update entity .
yes you can design a system that will
On addition: persists data to db and adds to cache
On read: reads data from cache, and considers a cache miss as not
present in database as well.
On update: updates data in db and then updates in cache (or vice
versa)
On delete: deletes data from cache and then deletes from database
This approach will work fine if you have a single application using that database and if data is not that critical. However if data integrity is of more importance, you may face following problems in this approach:
You may face a cache miss when data is present in database(persisted
but not yet cached)
You may get stale data from cache (updated in db but not yet updated
in cache)
Also if data is removed from database by some other application, it
will still ramained cached in distributed cache(invalid data on
reads)
A better mothod my be if you use a rich featured distributed caching solution like NCache / Tayzgrid which provides Read Trough / Write behind features. This way your application will only need to use cache for all reads, writes or updates and cache will keep database updated using configured providers.
Another approach may be to use distributed cache as hibernate's second level cache and you will not need to add a caching layer by your self. See this article for details about hibernate's second level cache.
Distributed caching solutions like Tayzgrid provide caching provider for hibernate that can be easily configured. You can find hibernate providers for other solutions as well.

What is the difference between normal browser caching and ASP cache object? How do they differ?

I am a beginner to ASP .Net caching concept
What is the difference between normal browser caching and ASP cache object? How do they differ?
Why do we need to have a cache in sever ? will it not cause a memory overhead in the sever?
ASP .Net allows to cache an ASP .Net page's response in multiple ways. You can specify to cache that page in browser or in your application domain.
A normal browser caching or page caching refers to cache an object in the requesting browser's cache so that the next request for the same page can be served locally. A page can be cached in requesting browser, proxy server, application server or multiple of these. See this article to set cacheability of a page: http://msdn.microsoft.com/en-us/library/w9s3a17d(v=vs.100).aspx
Where as cache object in ASP .Net is created one per application domain. It is an in-memory cache that can be used to store sessions or for any other object caching purpose, like caching data loaded from database etc. Note that this is an in-memory cache (like a hashtable in a program), any data stored in this cache will be available to that application only.
Why do we need to have a cache in sever ? will it not cause a memory
overhead in the sever?
Yes cache in server will cause memory overhead, but caching is always used to improve performance on cost of memory. For example instead of loading same data from a database on each request, data can be loaded in cache and all subsequent requests can be served from the cache, thus improving performance and reducing load on database server.
Apart from browser caching and in-memory caching, several out of process, distributed caching solutions like NCache are also available to boost performance of ASP .Net application and to overcome limitations in ASP.Net caching options. You can see further details here: http://www.alachisoft.com/ncache/asp-net-cache.html
I hope this helped :)

With Memcached and Squid, is there any need for ASP.NET caching?

With squid, we can cache webpages. I am not sure if it provides the same number of caching methods as ASP.NET caching (I primarily use ASP.NET), but it's a tool to cache webpages.
Then we have memcached, which can cache database tables. I believe this is correct, and it is like SqlCacheDependency (correct me if I am wrong).
However, is there any situation in a large web application where one would find room to use memcached, squid, AND ASP.NET (or PHP, JSP - application framework-level) caching.
Thanks!
You may find that caching entire pages is too coarsely-grained, and caching database tables doesn't get you enough of a boost, leaving a big need for caching chunks of stuff.
Say, for example, you had an application that showed the name of the logged-in user on every page. Caching entire pages wouldn't really work, so you need to drop down a level and cache somewhere within the app framework.
Then we have memcached, which can cache database tables. I believe this is correct, and it is like SqlCacheDependency (correct me if I am wrong).
Memcached is a distributed hashtable. The main benefits over the built in .NET caching is that your cache is scalable (you can add as many memcached boxen as you want) and synchronized (all your web servers have access to the same stuff, and invalidating or updating data from one web server is instantly propagated to all of them)
Performance-wise, it is worse than the .NET cache (you are looking up objects across servers, as opposed to an in-memory lookup on one machine)
However, is there any situation in a large web application where one would find room to use memcached, squid, AND ASP.NET (or PHP, JSP - application framework-level) caching.
For the reasons above, I can imagine a 2-level cache, using the .NET cache first, then memcached. (e.g. a Get() looks at memcached, stores the result in the .NET cache set to expire in 10 seconds, then uses the .NET cache for all the get calls with the same cache key during the next 10 seconds, rinse, repeat)
This way, you get the performance of the in-memory cache lookup without the network IO cost of a pure memcached solution, with the synchronization and scalability benefits of memcached.

Resources