Hibernate cache vs spring cache - spring

Can anybody explain what is the difference between hibernate second level cache vs spring cache.?
Does it make sense to use both in single application? If it is not recommend then when to use which one?
Appreciated if someone give real life Scenario based explain, it can help much to understand easily.

These are two completely different technologies. Hibernate and Hibernate Cache are applicable in general when you're working with Relational Databases. Then you can use Hibernate ORM to generate queries, store objects, etc. The domain model is written in java (entities). Sometimes it makes sense to cache some of these entities in memory to speed up the query, so you cache them with Hibernate Cache. There are many different kinds of caching there, I won't dive into the details, because its a general question, but read here if you want to know more about Hibernate Caching
Now spring caching is done by Spring and in general, it has nothing to do with the Relational Databases/ JDBC world, in other words outside the realm of Hibernate. You can cache an object to avoid, for example, call to MongoDb, or to avoid an expensive calculation to be done twice. You can cache the data in memory or in more advanced distributed technologies like Hazelcast, Redis or Infinispan (there are others as well).
Here you can find an introductory material to Spring Caching. And this is a way more complete official documentation
So yes, to directly answer your question, it might make sense to use both in a single application :)
I really think you should get familiar with both, at least at the level of concepts and their goals, and then decide what is applicable in your case.

they are totally different
hibernate second level
Hibernate second level cache is used in the context of Hibernate, so all the session share the same instance. It's deactivated by default and in order to use it, you should enable it like this:
hibernate.cache.use_second_level_cache=true
In order to make an entity eligible for second-level caching, we annotate it with Hibernate specific #org.hibernate.annotations.Cache annotation and specify a cache concurrency strategy.
Some developers consider that it is a good convention to add the standard #javax.persistence.Cacheable annotation as well (although not required by Hibernate), so an entity class implementation might look like this:
Example
#Entity
#Cacheable
#org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Foo {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
#Column(name = "ID")
private long id;
#Column(name = "NAME")
private String name;
// getters and setters
}
Spring cache
if Hibernate second level cache is used for caching instance of JPA entities and query result, spring cache aimed to cache spring beans.
Example
#Cacheable("addresses")
public String getAddress(Customer customer) {...}
The getAddress() call will first check the cache addresses before actually invoking the method and then caching the result.
I hope I was clear in my explanation

One of their main differences is hibernate 2nd level cache will automatically help to maintain the cached entities whenever there are any updates or deletes of the cached entities while Spring cache is a more general purpose cache which know nothing about Hibernate and so you have to invalidate the stale cache entities manually for such cases.
Also the entities cached by Hibernate 2nd level cache will be managed by it while in spring-cache it will become detached. And dealing with the detached entities is not easy if you are not familiar with it.
It always depend on the context whether it makes sense to use both caches. For me , using Hibernate 2nd level cache requires some learning curve in order to use it correctly . Spring cache is more flexible due to its general purpose nature but it requires to do more work by yourself.
I would use Hibernate 2nd level cache first as it requires to do less things once you master it . And consider to use Spring cache if come across a situation that Hibernate does not support configuring the caching behaviour that I want.
My real life example is that I have some background data cleaning task that requires executing some native queries which causes to remove all entities from 2nd level cache which in turn affect one of my hibernate query cache. Because the native queries is very complex , I fail to control not to invalidate the cache even using the tips mentioned at here. So I change to use Spring cache to cache that query result.

Related

optimizing findAll in spring Data JPA

I have a table which has a list of lookup values max 50 rows.
Currently, I am querying this table every time to look for a particular value which is not efficient.
So, I am planning to optimize this by loading all the value at once as a List from the repository using findAll.
List<CardedList> findAll();
My question here is
Class A -> Class B - Class B which holds this repository. Will it query findAll everytime when Class A calls Class B?
Class A {
//foreach items in the list call Class B
b.someMethod();
}
Class B {
#Autowired
CardedListRepository cardRepo;
someMethod() {
cardRepo.findAll();
}
}
What is the best way to achieve this?
If it is just 50 rows you could cache them in an instance variable of a service and check like this:
Class B {
#Autowired
CardedListRepository cardRepo;
List<CardedList> cardedList = new ArrayList<>();
someMethod() {
if(cardedList.isEmpty())
{
cardedList = cardRepo.findAll();
}
// do others in someMethod
}
The proposed "solution" by #Juliyanage Silva (to "cache" the findAll query result as a simple instance variable of service B) can be very dangerous and should not be implemented before checking very carefully that it works under all circumstances.
Just imagine the same service instance being called from a subsequent transaction - you would end up with a (probably outdated) list of detached entities.
(e.g. leading to LazyInitializationExceptions when accessing not initialized properties, etc.)
Hibernate already provides several caching mechanisms, as e.g. a standard first level cache, which avoids unnecessary DB round trips when looking for an already loaded entity by ID within the same transaction.
However, query results (as from findAll) are not cached by default, as explained in the documentation:
Caching of query results introduces some overhead in terms of your applications normal transactional processing. For example, if you cache results of a query against Person, Hibernate will need to keep track of when those results should be invalidated because changes have been committed against any Person entity.
That, coupled with the fact that most applications simply gain no benefit from caching query results, leads Hibernate to disable caching of query results by default.
To enable the Hibernate query cache, the second level cache needs to be configured. To prevent ending up with stale entries when having multiple application instances, this calls for a distributed cache (like Hazelcast or EhCache).
There are also various discussions on using springs caching mechanisms for this purpose. However, there are also various pitfalls when it comes to caching collections. And when running multiple application instances you may need a distributed cache or another global invalidation mechanism, too.
How to add cache feature in Spring Data JPA CRUDRepository
Spring Cache with collection of items/entities
Spring Caching not working for findAll method
So depending on your use-case, it may be the easiest to just avoid unnecessary calls of service B by storing the result in a local variable within the calling method of service A.

Spring caching implementation

I am exploring spring caching facility. I have few queries regarding this.
First, should it be applied at service method level or DAO method level, where service method is calling DAO method.
Second, how should I avoid cache data getting stale?
IMO, The answer to both questions is "it depends".
Technically, Spring Wise, cache annotations applied on both Service and DAO will work, I don't think there is any difference, so it boils down to the concrete use case.
For example, if you "logically" plan to provide a cacheable abstraction of what should be calculated as a result of some computational process done on server, you better go with Caching at the service level.
If, on the other hand you have a DAO method that looks like Something getSomethingById(id) in the dao, and you would like to avoid relatively expensive calls to the underlying database, you can provide a cache at the level of the DAO. Having said that, it probably won't be useful to apply caching if you have methods like List<Something> fetchAll() or List<Something> fetchAllByFilter(). If you're working with JPA (implemented with Hibernate) they have their own abstraction of cache, but its kind of beyond the scope of the question, just something you should be aware of...
There are plenty of tutorials available on internet, some illustrate the service based approach, some go for DAO's methods annotations but again, these are only simple examples, in the real world you'll have to take a decision.
Now regarding the second question. In general caching makes sense if your data doesn't change much, so first off if it changes often, then probably caching is not appropriate/relevant for the use case.
Other than that, there are many techniques:
Cache Data Eviction (usually time based). See this tutorial
Some kind of messaging system that will send the message about the cache entry change. This one is especially useful if you have a distributed application and keep the cache in-memory only. When getting a message you might opt for "cache replication" or totally wiping out the cache, so that it will be "filled" with a new data eventually
Using the Distributed cache technologies like Hazelcast or Redis as opposed to the in-memory caching. So that technically the cached data consistency will be guaranteed by caching provider.
I would like also to recommend you This tutorial - the speaker talks about different aspects of caching implementation and I think its really relevant to your question.

Caching (Ehcache) - Hibernate 2nd level cache and Spring

In my web application (Spring 3.1, Hibernate 4), I am using Ehcache for Hibernate 2nd level cache and Spring #Cache. I would like to know, where to use Hibernate Cache and Spring Cache?
For Example, I have few domain classes (view in database) which I am using as lookup values on screen. I can cache them using Hibernate 2nd level cache as well as Spring #Cache.
So, in my service layer if I cache these domain objects using Spring #Cache, I would receive these objects without hitting persistence layer at all (hibernate HQL query), once cached. Is it right approach?
Depends on your layer architecture.
Assume you have three services (or three methods within the same service) that all return a collection of Customer entities i.e. domain objects. If you cache at service layer there's a fair chance the same representation of a single database record will live in the cache multiple times. They are multiple objects of essentially the same information. Why? Because the results of Service.getWhateverCustomers(String) and Service.getWhateverCustomers(String, Integer) are stored under two different cache keys.
If you cache at entity level using the JPA #Cachable annotation on the other hand your Customer entity is cached no matter from which service or service method you call the code that actually retrieves the entity. Of course the rules about when the JPA provider can/does cache an entity apply. Read up on them if you're not familiar with them.
Hope this gives you an idea which path to follow. Post follow-up comments if you have more questions and I'll edit this answer.
The right approach is:
Ask yourself if you even need to mess with the complexity of caching. Is your app failing to perform up to requirements?
Only if the answer to the previous question is "yes", profile your app to find out where the performance problem(s) is/are.
Determine the appropriate way to solve a performance problem identified in step 2. It may or may not involve caching to prevent costly operations. If it does involve caching, then where to cache and which cache to use should be abundantly clear because you'll know exactly what you're trying to prevent from happening.
The moral of the story is that you don't cache because it's cool. You cache for performance. And you only optimize code when it's proven necessary.

Spring,Hibernate - Batch processing of large amounts of data with good performance

Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?
Example 1# (with bad performance) :
#Singleton
public ServiceBean {
procesAllData(){
List<Entity> entityList = dao.findAll();
for(...){
process(entity);
}
}
private void process(Entity ent){
//data processing
//saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
}
}
What could be improved here ?
In my opinion :
I would set hibernate batch size (see hibernate documentation for batch processing).
I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.
Do you agree ? Any tips ?
Here are 2 basic strategies:
JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.
You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.
There are a few things to keep in mind here:
Loading all entites into memory with a findAll method can lead to OOM exceptions.
You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.
Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.
The other alternative is to use a stateful session but clear the session at regular intervals as shown here.
I hope this is useful advice.

JPA2 Entities Caching

As it stands I am using a JSF request scoped bean to do all my CRUD operations. As I'm sure you most likely know Tomcat doesn't provide container managed persistence so in my CRUD request bean I am using EnityManagerFactory to get fold of enity manager. Now about the validity of my choice to use request scoped bean for this task, it's probably open for a discussion (again) but I've been trying to put it in the context of what I've read in the articles you gave me links to, specifically the first and second one. From what I gather EclipseLink uses Level 2 cache by default which stored cached entity. On ExlipseLink Examples - JPA Caching website it says that:
The shared cache exists for the duration of the persistence unit ( EntityManagerFactory, or server)
Now doesn't that make my cached entities live for a fraction of time during the call that is being made to the CRUD request bean because the moment the bean is destroyed and with it EntityManagerFactory then so is the cache. Also the last part of the above sentence "EntityManagerFactory, or server" gets me confused .. what precisely is meant by or server in this context and how does one control it. If I use the #Cache annotation and set appropriate amount of expire attribute, will that do the job and keep the entities stored on the servers L2 cache than, regardless of whether my EntityManagerFactory has been destroyed ?
I understand there is a lot of consideration to do and each application has specific requirements . From my point of view configuring L2 cache is probably the most desirable (if not only, on Tomcat) option to get things optimized. Quoting from your first link:
The advantages of L2 caching are:
avoids database access for already loaded entities
faster for reading frequently accessed unmodified entities
The disadvantages of L2 caching are:
memory consumption for large amount of objects
stale data for updated objects
concurrency for write (optimistic lock exception, or pessimistic lock)
bad scalability for frequent or concurrently updated entities
You should configure L2 caching for entities that are:
read often
modified infrequently
not critical if stale
Almost all of the above points apply to my app. At the heart of it, amongst other things, is constant and relentless reading of entities and displaying them on the website (the app will serve as a portal for listing properties). There's also a small shopping cart being build in the application but the products sold are not tangible items that come as stock but services. In this case stale entities are no problem and also, so I think, isn't concurrency as the products (here services) will never be written to. So the entities will be read often, and they will be modified infrequently (and those modified are not part of the cart anyway, an even those are modified rarely) and therefore not critical if stale. Finally the first two points seem to be exactly what I need, namely avoidance of database access to already loaded entities and fast reading of frequently accessed unmodified enteties. But there is one point in disadvantages which still concerns me a bit: memory consumption for large amount of objects. Isn't it similar to my original problem?
My current understanding is that there are two options, only one of which applies to my situation:
To be able to delegate the job of longer term caching to the persistence layer than I need to have access to PersistenceContext and create a session scoped bean and set PersistenceContextType.EXTENDED. (this options doesn't apply to me, no access to PersistenceContext).
Configure the L2 #Cache annotation on entities, or like in option 1 above create a session scoped bean that will handle long term caching. But aren't these just going back to my original problem?
I'd really like to hear you opinion and see what do you think could be a reasonable way to approach this, or perhaps how you have been approaching it in your previous projects. Oh, and one more thing, just to confirm.. when annotating an entity with #Cache all linked entities will be cached along so I don't have to annotate all of them?
Again all the comments and pointers much appreciated.
Thanks for you r answer .. when you say
"In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server."
Does it mean I could for example declare and initialize static EntityManagerFactory field in an application scoped been to be later used by all the beans throughout the life of the application ?
EclipseLink uses a shared cache by default. This is shared for all EntityManagers accessed from an EntityManagerFactory. You do not need to do anything to enable caching.
In general, you do not want to be creating a new EntityManagerFactory per request, only a new EntityManager. Creating a new EntityManagerFactory is quite expensive, so not a good idea, even ignoring caching (it has its own connection pool, must initialize the meta-data, etc.).
In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server. Either never close it, or close it when a Servlet is destroyed.

Resources