optimizing findAll in spring Data JPA - spring-boot

I have a table which has a list of lookup values max 50 rows.
Currently, I am querying this table every time to look for a particular value which is not efficient.
So, I am planning to optimize this by loading all the value at once as a List from the repository using findAll.
List<CardedList> findAll();
My question here is
Class A -> Class B - Class B which holds this repository. Will it query findAll everytime when Class A calls Class B?
Class A {
//foreach items in the list call Class B
b.someMethod();
}
Class B {
#Autowired
CardedListRepository cardRepo;
someMethod() {
cardRepo.findAll();
}
}
What is the best way to achieve this?

If it is just 50 rows you could cache them in an instance variable of a service and check like this:
Class B {
#Autowired
CardedListRepository cardRepo;
List<CardedList> cardedList = new ArrayList<>();
someMethod() {
if(cardedList.isEmpty())
{
cardedList = cardRepo.findAll();
}
// do others in someMethod
}

The proposed "solution" by #Juliyanage Silva (to "cache" the findAll query result as a simple instance variable of service B) can be very dangerous and should not be implemented before checking very carefully that it works under all circumstances.
Just imagine the same service instance being called from a subsequent transaction - you would end up with a (probably outdated) list of detached entities.
(e.g. leading to LazyInitializationExceptions when accessing not initialized properties, etc.)
Hibernate already provides several caching mechanisms, as e.g. a standard first level cache, which avoids unnecessary DB round trips when looking for an already loaded entity by ID within the same transaction.
However, query results (as from findAll) are not cached by default, as explained in the documentation:
Caching of query results introduces some overhead in terms of your applications normal transactional processing. For example, if you cache results of a query against Person, Hibernate will need to keep track of when those results should be invalidated because changes have been committed against any Person entity.
That, coupled with the fact that most applications simply gain no benefit from caching query results, leads Hibernate to disable caching of query results by default.
To enable the Hibernate query cache, the second level cache needs to be configured. To prevent ending up with stale entries when having multiple application instances, this calls for a distributed cache (like Hazelcast or EhCache).
There are also various discussions on using springs caching mechanisms for this purpose. However, there are also various pitfalls when it comes to caching collections. And when running multiple application instances you may need a distributed cache or another global invalidation mechanism, too.
How to add cache feature in Spring Data JPA CRUDRepository
Spring Cache with collection of items/entities
Spring Caching not working for findAll method
So depending on your use-case, it may be the easiest to just avoid unnecessary calls of service B by storing the result in a local variable within the calling method of service A.

Related

Hibernate cache vs spring cache

Can anybody explain what is the difference between hibernate second level cache vs spring cache.?
Does it make sense to use both in single application? If it is not recommend then when to use which one?
Appreciated if someone give real life Scenario based explain, it can help much to understand easily.
These are two completely different technologies. Hibernate and Hibernate Cache are applicable in general when you're working with Relational Databases. Then you can use Hibernate ORM to generate queries, store objects, etc. The domain model is written in java (entities). Sometimes it makes sense to cache some of these entities in memory to speed up the query, so you cache them with Hibernate Cache. There are many different kinds of caching there, I won't dive into the details, because its a general question, but read here if you want to know more about Hibernate Caching
Now spring caching is done by Spring and in general, it has nothing to do with the Relational Databases/ JDBC world, in other words outside the realm of Hibernate. You can cache an object to avoid, for example, call to MongoDb, or to avoid an expensive calculation to be done twice. You can cache the data in memory or in more advanced distributed technologies like Hazelcast, Redis or Infinispan (there are others as well).
Here you can find an introductory material to Spring Caching. And this is a way more complete official documentation
So yes, to directly answer your question, it might make sense to use both in a single application :)
I really think you should get familiar with both, at least at the level of concepts and their goals, and then decide what is applicable in your case.
they are totally different
hibernate second level
Hibernate second level cache is used in the context of Hibernate, so all the session share the same instance. It's deactivated by default and in order to use it, you should enable it like this:
hibernate.cache.use_second_level_cache=true
In order to make an entity eligible for second-level caching, we annotate it with Hibernate specific #org.hibernate.annotations.Cache annotation and specify a cache concurrency strategy.
Some developers consider that it is a good convention to add the standard #javax.persistence.Cacheable annotation as well (although not required by Hibernate), so an entity class implementation might look like this:
Example
#Entity
#Cacheable
#org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Foo {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
#Column(name = "ID")
private long id;
#Column(name = "NAME")
private String name;
// getters and setters
}
Spring cache
if Hibernate second level cache is used for caching instance of JPA entities and query result, spring cache aimed to cache spring beans.
Example
#Cacheable("addresses")
public String getAddress(Customer customer) {...}
The getAddress() call will first check the cache addresses before actually invoking the method and then caching the result.
I hope I was clear in my explanation
One of their main differences is hibernate 2nd level cache will automatically help to maintain the cached entities whenever there are any updates or deletes of the cached entities while Spring cache is a more general purpose cache which know nothing about Hibernate and so you have to invalidate the stale cache entities manually for such cases.
Also the entities cached by Hibernate 2nd level cache will be managed by it while in spring-cache it will become detached. And dealing with the detached entities is not easy if you are not familiar with it.
It always depend on the context whether it makes sense to use both caches. For me , using Hibernate 2nd level cache requires some learning curve in order to use it correctly . Spring cache is more flexible due to its general purpose nature but it requires to do more work by yourself.
I would use Hibernate 2nd level cache first as it requires to do less things once you master it . And consider to use Spring cache if come across a situation that Hibernate does not support configuring the caching behaviour that I want.
My real life example is that I have some background data cleaning task that requires executing some native queries which causes to remove all entities from 2nd level cache which in turn affect one of my hibernate query cache. Because the native queries is very complex , I fail to control not to invalidate the cache even using the tips mentioned at here. So I change to use Spring cache to cache that query result.

Caching (Ehcache) - Hibernate 2nd level cache and Spring

In my web application (Spring 3.1, Hibernate 4), I am using Ehcache for Hibernate 2nd level cache and Spring #Cache. I would like to know, where to use Hibernate Cache and Spring Cache?
For Example, I have few domain classes (view in database) which I am using as lookup values on screen. I can cache them using Hibernate 2nd level cache as well as Spring #Cache.
So, in my service layer if I cache these domain objects using Spring #Cache, I would receive these objects without hitting persistence layer at all (hibernate HQL query), once cached. Is it right approach?
Depends on your layer architecture.
Assume you have three services (or three methods within the same service) that all return a collection of Customer entities i.e. domain objects. If you cache at service layer there's a fair chance the same representation of a single database record will live in the cache multiple times. They are multiple objects of essentially the same information. Why? Because the results of Service.getWhateverCustomers(String) and Service.getWhateverCustomers(String, Integer) are stored under two different cache keys.
If you cache at entity level using the JPA #Cachable annotation on the other hand your Customer entity is cached no matter from which service or service method you call the code that actually retrieves the entity. Of course the rules about when the JPA provider can/does cache an entity apply. Read up on them if you're not familiar with them.
Hope this gives you an idea which path to follow. Post follow-up comments if you have more questions and I'll edit this answer.
The right approach is:
Ask yourself if you even need to mess with the complexity of caching. Is your app failing to perform up to requirements?
Only if the answer to the previous question is "yes", profile your app to find out where the performance problem(s) is/are.
Determine the appropriate way to solve a performance problem identified in step 2. It may or may not involve caching to prevent costly operations. If it does involve caching, then where to cache and which cache to use should be abundantly clear because you'll know exactly what you're trying to prevent from happening.
The moral of the story is that you don't cache because it's cool. You cache for performance. And you only optimize code when it's proven necessary.

Problems with Spring and Hibernate SessionFactory: Domain object scope restricted to session

I have been using the session factory (Singleton Bean injected into the DAO objects) in my Spring/Hibernate application, I am using the service layers architecture, and I have the following issue:
Anytime I get a domain object from the database, it uses a new session provided by the hibernate session factory. In the case of requesting several times the same row, this leads to having multiple instances of that same domain object. (If using a single session, it would return multiple objects pointing to the same reference) Thus, any changes made to one of those domain object is not taken into account by the other domain objects representing this same row.
I am developing a SWING application with multiple views and I get the same DB row from different locations (And queries), and I thus need to obtain domain objects pointing to the same instance.
My question is then, Is it a way to make this happen using the SessionFactory? If not, is it a good practice to use a single session for my whole application? In that case, how and where should I declare this session? (Should it be a bean injected into the DAO objects just like the sessionFactory?)
Thank you in advance for your help
Hibernate session (I will call it h-session) in Spring usually bound to thread (see JavaDoc for HibernateTransactionManager), so h-session acquired once per thread.
First level cache (h-session cache - always turned on) used to retrieve same object if you call "get" or "load" several times on one h-session. But this cache doesn't work for queries.
Also, you shouldn't forget about problems related to transaction isolation. In most applications "Read committed" isolation level is used. And this isolation level affected by phenomenon known as "non-repeatable reads". Basically, you could receive several versions of the same row in one transaction if you query for this row several times (because row could be updated between queries in another transaction).
So, you shouldn't query several times for same data in one h-session/transaction.
You're looking for the Open Session in View Pattern. Essentially, you want to bind a Session to your thread on application startup and use the same Session throughout the lifetime of the application. You can do this by creating a singleton util class which keeps a session like so (note that the example I have uses an EntityManager instead of a Session, but your code will be essentially the same):
private static EntityManager entityManager;
public static synchronized void setupEntityManager() {
if (entityManager == null) {
entityManager = entityManagerFactory.createEntityManager();
}
if (!TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.bindResource(entityManagerFactory, new EntityManagerHolder(entityManager));
}
}
public static synchronized void tearDownEntityManager() {
if (entityManager != null) {
if (entityManager.isOpen()) {
entityManager.close();
}
if (TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.unbindResource(entityManagerFactory);
}
if (entityManagerFactory.isOpen()) {
entityManagerFactory.close();
}
}
}
Note that there are inherent risks associated with the Open Session in View pattern. For example, I noticed in the comments that you intend to use threading in your application. Sessions are not threadsafe. So you'll have to make sure you aren't trying to access the database in a threaded manner.*
You'll also have to be more aware of your fetching strategy for collections. With an open session and lazy loading there's always the chance that you'll put undue load on your database.
*I've used this approach in a NetBeans application before, which I know uses threading for certain tasks. We never had any problems with it, but you need to be aware of the risks, of which there are many.
Edit
Depending on your situation, it may also be possible to evict your domain objects from the Session and cache the detached objects for later use. This strategy would of require that your domain objects not be updated very often, otherwise your application would become unnecessarily complicated.

Spring,Hibernate - Batch processing of large amounts of data with good performance

Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?
Example 1# (with bad performance) :
#Singleton
public ServiceBean {
procesAllData(){
List<Entity> entityList = dao.findAll();
for(...){
process(entity);
}
}
private void process(Entity ent){
//data processing
//saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
}
}
What could be improved here ?
In my opinion :
I would set hibernate batch size (see hibernate documentation for batch processing).
I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.
Do you agree ? Any tips ?
Here are 2 basic strategies:
JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.
You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.
There are a few things to keep in mind here:
Loading all entites into memory with a findAll method can lead to OOM exceptions.
You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.
Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.
The other alternative is to use a stateful session but clear the session at regular intervals as shown here.
I hope this is useful advice.

JPA2 Entities Caching

As it stands I am using a JSF request scoped bean to do all my CRUD operations. As I'm sure you most likely know Tomcat doesn't provide container managed persistence so in my CRUD request bean I am using EnityManagerFactory to get fold of enity manager. Now about the validity of my choice to use request scoped bean for this task, it's probably open for a discussion (again) but I've been trying to put it in the context of what I've read in the articles you gave me links to, specifically the first and second one. From what I gather EclipseLink uses Level 2 cache by default which stored cached entity. On ExlipseLink Examples - JPA Caching website it says that:
The shared cache exists for the duration of the persistence unit ( EntityManagerFactory, or server)
Now doesn't that make my cached entities live for a fraction of time during the call that is being made to the CRUD request bean because the moment the bean is destroyed and with it EntityManagerFactory then so is the cache. Also the last part of the above sentence "EntityManagerFactory, or server" gets me confused .. what precisely is meant by or server in this context and how does one control it. If I use the #Cache annotation and set appropriate amount of expire attribute, will that do the job and keep the entities stored on the servers L2 cache than, regardless of whether my EntityManagerFactory has been destroyed ?
I understand there is a lot of consideration to do and each application has specific requirements . From my point of view configuring L2 cache is probably the most desirable (if not only, on Tomcat) option to get things optimized. Quoting from your first link:
The advantages of L2 caching are:
avoids database access for already loaded entities
faster for reading frequently accessed unmodified entities
The disadvantages of L2 caching are:
memory consumption for large amount of objects
stale data for updated objects
concurrency for write (optimistic lock exception, or pessimistic lock)
bad scalability for frequent or concurrently updated entities
You should configure L2 caching for entities that are:
read often
modified infrequently
not critical if stale
Almost all of the above points apply to my app. At the heart of it, amongst other things, is constant and relentless reading of entities and displaying them on the website (the app will serve as a portal for listing properties). There's also a small shopping cart being build in the application but the products sold are not tangible items that come as stock but services. In this case stale entities are no problem and also, so I think, isn't concurrency as the products (here services) will never be written to. So the entities will be read often, and they will be modified infrequently (and those modified are not part of the cart anyway, an even those are modified rarely) and therefore not critical if stale. Finally the first two points seem to be exactly what I need, namely avoidance of database access to already loaded entities and fast reading of frequently accessed unmodified enteties. But there is one point in disadvantages which still concerns me a bit: memory consumption for large amount of objects. Isn't it similar to my original problem?
My current understanding is that there are two options, only one of which applies to my situation:
To be able to delegate the job of longer term caching to the persistence layer than I need to have access to PersistenceContext and create a session scoped bean and set PersistenceContextType.EXTENDED. (this options doesn't apply to me, no access to PersistenceContext).
Configure the L2 #Cache annotation on entities, or like in option 1 above create a session scoped bean that will handle long term caching. But aren't these just going back to my original problem?
I'd really like to hear you opinion and see what do you think could be a reasonable way to approach this, or perhaps how you have been approaching it in your previous projects. Oh, and one more thing, just to confirm.. when annotating an entity with #Cache all linked entities will be cached along so I don't have to annotate all of them?
Again all the comments and pointers much appreciated.
Thanks for you r answer .. when you say
"In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server."
Does it mean I could for example declare and initialize static EntityManagerFactory field in an application scoped been to be later used by all the beans throughout the life of the application ?
EclipseLink uses a shared cache by default. This is shared for all EntityManagers accessed from an EntityManagerFactory. You do not need to do anything to enable caching.
In general, you do not want to be creating a new EntityManagerFactory per request, only a new EntityManager. Creating a new EntityManagerFactory is quite expensive, so not a good idea, even ignoring caching (it has its own connection pool, must initialize the meta-data, etc.).
In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server. Either never close it, or close it when a Servlet is destroyed.

Resources