Caching domain objects in Grails - caching

I have been considering implementing EhCache in my Grails domain objects like this :
static mapping = {
cache true
}
I am not too familiar with exactly how this caching mechanism works and was wondering what a good rule of thumb is in determining which domain objects would benefit from being cached. eg, objects that are accessed rarely.. often... ?
Thanks!

Caching only works for get() calls by default, but queries use the query cache if you update them with cache: true (criteria and HQL).
cache true creates a read-write cache but you can configure a read-only cache with
static mapping = {
cache usage:'read-only'
}
The read-only cache is good for lookup data that never changes, for example states, countries, roles, etc.
If you have domain classes that update, create, or delete frequently, query caching will often be slower than not caching. This is because changes like these cause all cached queries to be cleared, so you're often going directly to the database anyway. See http://tech.puredanger.com/2009/07/10/hibernate-query-cache/ for a more detailed description of this. For this reason I rarely use query caching and often disable it completely with
hibernate {
cache.use_second_level_cache=true
cache.use_query_cache=false
cache.provider_class='org.hibernate.cache.EhCacheProvider'
}
Domain classes that are "read-mostly" are the best candidates for read-write caching. The caches get cleared for each update, create, and delete, but if these are somewhat rare you'll see an overall performance boost.
Hibernate has an API to monitor cache usage. The http://grails.org/plugin/app-info and http://grails.org/plugin/hibernate-stats plugins make the information available and you can use the approach there in your own code.

Related

Dealing with concurrency issues when caching for high-traffic sites

I was asked this question in an interview:
For a high traffic website, there is a method (say getItems()) that gets called frequently. To prevent going to the DB each time, the result is cached. However, thousands of users may be trying to access the cache at the same time, and so locking the resource would not be a good idea, because if the cache has expired, the call is made to the DB, and all the users would have to wait for the DB to respond. What would be a good strategy to deal with this situation so that users don't have to wait?
I figure this is a pretty common scenario for most high-traffic sites these days, but I don't have the experience dealing with these problems--I have experience working with millions of records, but not millions of users.
How can I go about learning the basics used by high-traffic sites so that I can be more confident in future interviews? Normally I would start a side project to learn some new technology, but it's not possible to build out a high-traffic site on the side :)
The problem you were asked on the interview is the so-called Cache miss-storm - a scenario in which a lot of users trigger regeneration of the cache, hitting in this way the DB.
To prevent this, first you have to set soft and hard expiration date. Lets say the hard expiration date is 1 day, and the soft 1 hour. The hard is one actually set in the cache server, the soft is in the cache value itself (or in another key in the cache server). The application reads from cache, sees that the soft time has expired, set the soft time 1 hour ahead and hits the database. In this way the next request will see the already updated time and won't trigger the cache update - it will possibly read stale data, but the data itself will be in the process of regeneration.
Next point is: you should have procedure for cache warm-up, e.g. instead of user triggering cache update, a process in your application to pre-populate the new data.
The worst case scenario is e.g. restarting the cache server, when you don't have any data. In this case you should fill cache as fast as possible and there's where a warm-up procedure may play vital role. Even if you don't have a value in the cache, it would be a good strategy to "lock" the cache (mark it as being updated), allow only one query to the database, and handle in the application by requesting the resource again after a given timeout
You could probably be better of using some distributed cache repository, as memcached, or others depending your access pattern.
You could use the Cache implementation of Google's Guava library if you want to store the values inside the application.
From the coding point of view, you would need something like
public V get(K key){
V value = map.get(key);
if (value == null) {
synchronized(mutex){
value = map.get(key);
if (value == null) {
value = db.fetch(key);
map.put(key, value);
}
}
}
return value;
}
where the map is a ConcurrentMap and the mutex is just
private static Object mutex = new Object();
In this way, you will have just one request to the db per missing key.
Hope it helps! (and don't store null's, you could create a tombstone value instead!)
Cache miss-storm or Cache Stampede Effect, is the burst of requests to the backend when cache invalidates.
All high concurrent websites I've dealt with used some kind of caching front-end. Bein Varnish or Nginx, they all have microcaching and stampede effect suppression.
Just google for Nginx micro-caching, or Varnish stampede effect, you'll find plenty of real world examples and solutions for this sort of problem.
All boils down to whether or not you'll allow requests pass through cache to reach backend when it's in Updating or Expired state.
Usually it's possible to actively refresh cache, holding all requests to the updating entry, and then serve them from cache.
But, there is ALWAYS the question "What kind of data are you supposed to be caching or not", because, you see, if it is just plain text article, which get an edit/update, delaying cache update is not as problematic than if your data should be exactly shown on thousands of displays (real-time gaming, financial services, and so on).
So, the correct answer is, microcache, suppression of stampede effect/cache miss storm, and of course, knowing which data to cache when, how and why.
It is worse to consider particular data type for caching only if data consumers are ready for getting stale date (in reasonable bounds).
In such case you could define invalidation/eviction/update policy to keep you data up-to-date (in business meaning).
On update you just replace data item in cache and all new requests will be responsed with new data
Example: Stocks info system. If you do not need real-time price info it is reasonable to keep in cache stock and update it every X mils/secs with expensive remote call.
Do you really need to expire the cache. Can you have an incremental update mechanism using which you can always increment the data periodically so that you do not have to expire your data but keep on refreshing it periodically.
Secondly, if you want to prevent too many users from hiting the db in one go, you can have a locking mechanism in your stored proc (if your db supports it) that prevents too many people hitting the db at the same time. Also, you can have a caching mechanism in your db so that if someone is asking for the exact same data from the db again, you can always return a cached value
Some applications also use a third service layer between the application and the database to protect the database from this scenario. The service layer ensures that you do not have the cache miss storm in the db
The answer is to never expire the Cache and have a background process update cache periodically. This avoids the wait and the cache-miss storms, but then why use cache in this scenario?
If your app will crash with a "Cache miss" scenario, then you need to rethink your app and what is cache verses needed In-Memory data. For me, I would use an In Memory database that gets updated when data is changed or periodically, not a Cache at all and avoid the aforementioned scenario.

Caching (Ehcache) - Hibernate 2nd level cache and Spring

In my web application (Spring 3.1, Hibernate 4), I am using Ehcache for Hibernate 2nd level cache and Spring #Cache. I would like to know, where to use Hibernate Cache and Spring Cache?
For Example, I have few domain classes (view in database) which I am using as lookup values on screen. I can cache them using Hibernate 2nd level cache as well as Spring #Cache.
So, in my service layer if I cache these domain objects using Spring #Cache, I would receive these objects without hitting persistence layer at all (hibernate HQL query), once cached. Is it right approach?
Depends on your layer architecture.
Assume you have three services (or three methods within the same service) that all return a collection of Customer entities i.e. domain objects. If you cache at service layer there's a fair chance the same representation of a single database record will live in the cache multiple times. They are multiple objects of essentially the same information. Why? Because the results of Service.getWhateverCustomers(String) and Service.getWhateverCustomers(String, Integer) are stored under two different cache keys.
If you cache at entity level using the JPA #Cachable annotation on the other hand your Customer entity is cached no matter from which service or service method you call the code that actually retrieves the entity. Of course the rules about when the JPA provider can/does cache an entity apply. Read up on them if you're not familiar with them.
Hope this gives you an idea which path to follow. Post follow-up comments if you have more questions and I'll edit this answer.
The right approach is:
Ask yourself if you even need to mess with the complexity of caching. Is your app failing to perform up to requirements?
Only if the answer to the previous question is "yes", profile your app to find out where the performance problem(s) is/are.
Determine the appropriate way to solve a performance problem identified in step 2. It may or may not involve caching to prevent costly operations. If it does involve caching, then where to cache and which cache to use should be abundantly clear because you'll know exactly what you're trying to prevent from happening.
The moral of the story is that you don't cache because it's cool. You cache for performance. And you only optimize code when it's proven necessary.

Plone 4.2 how to make PAS cache external usera data

I'm implementing a PAS plugin that handles authentications against mailservers. Actually only DBMail is implemented.
I realized, that the enumerateUsers function from the PAS plugin is called numerous times per request and requires my plugin to open/close an SQL connections for every (subsequent) request. Of course, this is very expensive.
The connections itself are handled in a plone tool, which is able to handle multiple different mailservers and delegeates the enumerateUsers call to wrapper objects that represent registered servers.
My question is now, what sort of cache (OOBTree, Session?) I should use to provide a temporary local storage for repeating enumerations and avoid subsequent SQL connections?
Another idea was, to hook into the user creation process that takes place on the first login, an external user issues and completely "localize" the users.
Third idea was, to store the needed data in the specific member, if possible.
What would be best practice here?
I'd cache the query results, indeed. You need to make a decision on how long to cache the results, and if stored long term, how to invalidate that cache or check for changes.
There are no best practices for these decisions, as they depend entirely on the type of data stored and the APIs of the backends. If they support some kind of freshness query, for example, then you store everything forever and poll the backend to see if the cache needs updating.
You can start with a simple request cache; query once per request, store it on the request object. Your cache will automatically be invalidated at the end of the request as the request object is cleaned up, the next request will be a clean slate.
If your backend users rarely change, you can cache information for longer, in a local cache. I'd use a volatile attribute on the plugin. Any attribute starting with _v_ is ignored by the persistence machinery. Thus, anything stored in a _v_ volatile attribute is both thread-local and only exists for the lifetime of the process, a restart of the server clears these automatically.
At the very least you should use an _v_ volatile attribute to store your backend SQL connections. That way they can stay open between requests, and can be re-used. Something like the following method would do nicely:
def _connection(self):
# Return a backend connection
if getattr(self, '_v_connection', None) is None:
# Create connection here
self._v_connection = yourdatabaseconnection
return self._v_connection
You could also use a persistent attribute on your plugin to store your cache. This cache would be committed to the ZODB and persist across restarts. You then really need to work out how to invalidate the contents; store timestamps and evict data when to old, etc.
Your cache datastructure depends entirely on your application needs. If you don't persist information, a dictionary (username -> information) could be more than enough. Persisted caches could benefit from using a OOBTree instead of a dictionary as they reduce chances of conflicts between different threads and are more efficient when it comes to large sets of data.
Whatever you do, you do not need to use a Session. Sessions are prone to conflicts, do not scale well, and are in any case not the place to store a cache of this kind.

Flushing entire cache at once with Enterprise Caching Block

I am looking into using Enterprise Caching Block for my .NET 3.5 service to cache a bunch of static data from the database.
From everything I have read, it seems that FileDependency is the best option for storing static data that does not expire too often. However, when the file changes and the cache is flushed, I need to get a callback once to do some post processing for that particular cache. If I implement ICacheItemRefreshAction and register it during adding an item to the cache, I get a callback for each one of them.
Is there a way to register a callback for the entire cache so that I dont see thousands of callbacks being invoked when the cache flushes?
Thanks
To address your follow up for a better way than FileDependency: you could wrap a SqlDependency in an ICacheItemExpiration. See SqlCacheDependency with the Caching Application Block for sample code.
That approach would only work with SQL Server and would require setting up Service Broker.
In terms of a cache level callback, I don't see an out of the box way to achieve that; almost everything is geared to the item level. What you could do would be to create your own CacheManager Implementation that features a cache level callback.
Another approach might be to have a ICacheItemRefreshAction that only performs any operations when the cache is empty (i.e. the last item has been removed).

JPA2 Entities Caching

As it stands I am using a JSF request scoped bean to do all my CRUD operations. As I'm sure you most likely know Tomcat doesn't provide container managed persistence so in my CRUD request bean I am using EnityManagerFactory to get fold of enity manager. Now about the validity of my choice to use request scoped bean for this task, it's probably open for a discussion (again) but I've been trying to put it in the context of what I've read in the articles you gave me links to, specifically the first and second one. From what I gather EclipseLink uses Level 2 cache by default which stored cached entity. On ExlipseLink Examples - JPA Caching website it says that:
The shared cache exists for the duration of the persistence unit ( EntityManagerFactory, or server)
Now doesn't that make my cached entities live for a fraction of time during the call that is being made to the CRUD request bean because the moment the bean is destroyed and with it EntityManagerFactory then so is the cache. Also the last part of the above sentence "EntityManagerFactory, or server" gets me confused .. what precisely is meant by or server in this context and how does one control it. If I use the #Cache annotation and set appropriate amount of expire attribute, will that do the job and keep the entities stored on the servers L2 cache than, regardless of whether my EntityManagerFactory has been destroyed ?
I understand there is a lot of consideration to do and each application has specific requirements . From my point of view configuring L2 cache is probably the most desirable (if not only, on Tomcat) option to get things optimized. Quoting from your first link:
The advantages of L2 caching are:
avoids database access for already loaded entities
faster for reading frequently accessed unmodified entities
The disadvantages of L2 caching are:
memory consumption for large amount of objects
stale data for updated objects
concurrency for write (optimistic lock exception, or pessimistic lock)
bad scalability for frequent or concurrently updated entities
You should configure L2 caching for entities that are:
read often
modified infrequently
not critical if stale
Almost all of the above points apply to my app. At the heart of it, amongst other things, is constant and relentless reading of entities and displaying them on the website (the app will serve as a portal for listing properties). There's also a small shopping cart being build in the application but the products sold are not tangible items that come as stock but services. In this case stale entities are no problem and also, so I think, isn't concurrency as the products (here services) will never be written to. So the entities will be read often, and they will be modified infrequently (and those modified are not part of the cart anyway, an even those are modified rarely) and therefore not critical if stale. Finally the first two points seem to be exactly what I need, namely avoidance of database access to already loaded entities and fast reading of frequently accessed unmodified enteties. But there is one point in disadvantages which still concerns me a bit: memory consumption for large amount of objects. Isn't it similar to my original problem?
My current understanding is that there are two options, only one of which applies to my situation:
To be able to delegate the job of longer term caching to the persistence layer than I need to have access to PersistenceContext and create a session scoped bean and set PersistenceContextType.EXTENDED. (this options doesn't apply to me, no access to PersistenceContext).
Configure the L2 #Cache annotation on entities, or like in option 1 above create a session scoped bean that will handle long term caching. But aren't these just going back to my original problem?
I'd really like to hear you opinion and see what do you think could be a reasonable way to approach this, or perhaps how you have been approaching it in your previous projects. Oh, and one more thing, just to confirm.. when annotating an entity with #Cache all linked entities will be cached along so I don't have to annotate all of them?
Again all the comments and pointers much appreciated.
Thanks for you r answer .. when you say
"In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server."
Does it mean I could for example declare and initialize static EntityManagerFactory field in an application scoped been to be later used by all the beans throughout the life of the application ?
EclipseLink uses a shared cache by default. This is shared for all EntityManagers accessed from an EntityManagerFactory. You do not need to do anything to enable caching.
In general, you do not want to be creating a new EntityManagerFactory per request, only a new EntityManager. Creating a new EntityManagerFactory is quite expensive, so not a good idea, even ignoring caching (it has its own connection pool, must initialize the meta-data, etc.).
In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server. Either never close it, or close it when a Servlet is destroyed.

Resources