Spring JPA: keeping the persistence context small - spring

How can I keep the persistence context small in a Spring JPA environment?
Why: I know that by keeping the persistence context small, there will be a significant performance boost!
The main problem area is:
#Transactional
void MethodA() {
WHILE retrieving next object of 51M (via a stateless session connection) DO
get some further (readonly) data
IF condition holds THEN
assessment = retrieve assession object (= record from database)
change assessment data
save the assessment to the database
}
Via experiments in this problem domain I know that when cleaning the persistence context every 250 iterations, then the performance will be a lot better.
When I add these lines to the code, so every 250 iterations:
#PersistenceContext
private EntityManager em;
WHILE ...
...
IF counter++ % 250 == 0 THEN
em.flush()
em.clear()
}
Then I get errors like "cannot reliably perform the flush operation".
I tried to make the main Transactional read-only and the asssessment-save part 'Transaction-requires-new', then I get errors like 'operating on a detached entity'. Very strange, because I never revisit an open entity.
So, how can I keep the persistence context small?
Have tried 10s of ways. Help is really appreciated.

I would suggest you move all the condition logic into your query so that you don't even have to load that many rows/objects. Or even better, write an update query that does all of that in a single transaction so you don't need to transfer any data at all between your application and database.
I don't think that flushing is necessary with a stateless session as it doesn't keep state i.e. it flushes and clears the persistence context after every operation, but apart from that, I also think this this might not be what you really want as that could lead to re-fetching of data.
If you don't want the persistence context to fill up, then use DTOs for fetching the data and execute update statements to flush the changes.

Related

Problems with Spring and Hibernate SessionFactory: Domain object scope restricted to session

I have been using the session factory (Singleton Bean injected into the DAO objects) in my Spring/Hibernate application, I am using the service layers architecture, and I have the following issue:
Anytime I get a domain object from the database, it uses a new session provided by the hibernate session factory. In the case of requesting several times the same row, this leads to having multiple instances of that same domain object. (If using a single session, it would return multiple objects pointing to the same reference) Thus, any changes made to one of those domain object is not taken into account by the other domain objects representing this same row.
I am developing a SWING application with multiple views and I get the same DB row from different locations (And queries), and I thus need to obtain domain objects pointing to the same instance.
My question is then, Is it a way to make this happen using the SessionFactory? If not, is it a good practice to use a single session for my whole application? In that case, how and where should I declare this session? (Should it be a bean injected into the DAO objects just like the sessionFactory?)
Thank you in advance for your help
Hibernate session (I will call it h-session) in Spring usually bound to thread (see JavaDoc for HibernateTransactionManager), so h-session acquired once per thread.
First level cache (h-session cache - always turned on) used to retrieve same object if you call "get" or "load" several times on one h-session. But this cache doesn't work for queries.
Also, you shouldn't forget about problems related to transaction isolation. In most applications "Read committed" isolation level is used. And this isolation level affected by phenomenon known as "non-repeatable reads". Basically, you could receive several versions of the same row in one transaction if you query for this row several times (because row could be updated between queries in another transaction).
So, you shouldn't query several times for same data in one h-session/transaction.
You're looking for the Open Session in View Pattern. Essentially, you want to bind a Session to your thread on application startup and use the same Session throughout the lifetime of the application. You can do this by creating a singleton util class which keeps a session like so (note that the example I have uses an EntityManager instead of a Session, but your code will be essentially the same):
private static EntityManager entityManager;
public static synchronized void setupEntityManager() {
if (entityManager == null) {
entityManager = entityManagerFactory.createEntityManager();
}
if (!TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.bindResource(entityManagerFactory, new EntityManagerHolder(entityManager));
}
}
public static synchronized void tearDownEntityManager() {
if (entityManager != null) {
if (entityManager.isOpen()) {
entityManager.close();
}
if (TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.unbindResource(entityManagerFactory);
}
if (entityManagerFactory.isOpen()) {
entityManagerFactory.close();
}
}
}
Note that there are inherent risks associated with the Open Session in View pattern. For example, I noticed in the comments that you intend to use threading in your application. Sessions are not threadsafe. So you'll have to make sure you aren't trying to access the database in a threaded manner.*
You'll also have to be more aware of your fetching strategy for collections. With an open session and lazy loading there's always the chance that you'll put undue load on your database.
*I've used this approach in a NetBeans application before, which I know uses threading for certain tasks. We never had any problems with it, but you need to be aware of the risks, of which there are many.
Edit
Depending on your situation, it may also be possible to evict your domain objects from the Session and cache the detached objects for later use. This strategy would of require that your domain objects not be updated very often, otherwise your application would become unnecessarily complicated.

Spring,Hibernate - Batch processing of large amounts of data with good performance

Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?
Example 1# (with bad performance) :
#Singleton
public ServiceBean {
procesAllData(){
List<Entity> entityList = dao.findAll();
for(...){
process(entity);
}
}
private void process(Entity ent){
//data processing
//saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
}
}
What could be improved here ?
In my opinion :
I would set hibernate batch size (see hibernate documentation for batch processing).
I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.
Do you agree ? Any tips ?
Here are 2 basic strategies:
JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.
You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.
There are a few things to keep in mind here:
Loading all entites into memory with a findAll method can lead to OOM exceptions.
You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.
Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.
The other alternative is to use a stateful session but clear the session at regular intervals as shown here.
I hope this is useful advice.

Which method in BMP is used to avoid unnecessary round trips to the database?

If I'm using BMP bean, is there any method which avoids unnecessary roundtrips to database and increase the efficiency...
Does any of these method serve the purpose? (Question in a certification test)
ejbSave(), ejbStore() or ejbPersist()
In a multi-tiered architecture, with database, application server, and Web layers—you optimize performance by reducing the network traffic “round trip.” Best approach is said to be to start and stop transactions at the application server level, in the EJB container. So would like to know the methods help reduce unnecessary round-trips for that in Bean Managed persistence type beans....Am new to ejb.., so am trying to learn the concepts
ejbSave() and ejbpersist() do not exist...
You wouldn't have to deal with any of these methods: 'ejbSave(), ejbStore() or ejbPersist()'
If I'm using BMP bean, is there any method which avoids unnecessary roundtrips to database
Short Answer:
Yes, methods of EntityManager
Long Answer:
To avoid network round trips to the database, you'd just have to set the transaction boundaries correctly. When you use the methods provided by EntityManager (I'm talking about JPA), the methods act on the persistence context. The persistence context being a cache, real db hits are avoided until the commit happens.
Following is a section from the TomEE docs
JPA 101
If there's one thing you have to understand to successfully use JPA (Java
Persistence API) it's the concept of a Cache. Almost everything boils
down to the Cache at one point or another. Unfortunately the Cache is an
internal thing and not exposed via the JPA API classes, so it not easy to
touch or feel from a coding perspective.
Here's a quick cheat sheet of the JPA world:
A Cache is a copy of data, copy meaning pulled from but living
outside the database.
Flushing a Cache is the act of putting modified data back into the
database.
A PersistenceContext is essentially a Cache. It also tends to have
it's own non-shared database connection.
An EntityManager represents a PersistenceContext (and therefore a
Cache)
An EntityManagerFactory creates an EntityManager (and therefore a
PersistenceContext/Cache)
With you are
responsible for EntityManager (PersistenceContext/Cache) creating and
tracking...
-- You must use the EntityManagerFactory to get an EntityManager
-- The resulting EntityManager instance is a
PersistenceContext/Cache
-- An EntityManagerFactory can be injected via the #PersistenceUnit
annotation only (not #PersistenceContext)
-- You are not allowed to use #PersistenceContext to refer to a unit
of type RESOURCE_LOCAL
-- You must use the EntityTransaction API to begin/commit around
every call to your EntityManger
-- Calling entityManagerFactory.createEntityManager() twice results in
two separate EntityManager instances and therefor two separate
PersistenceContexts/Caches.
-- It is almost never a good idea to have more than one instance of
an EntityManager in use (don't create a second one unless you've destroyed
the first)
With the container
will do EntityManager (PersistenceContext/Cache) creating and tracking...
-- You cannot use the EntityManagerFactory to get an EntityManager
-- You can only get an EntityManager supplied by the container
-- An EntityManager can be injected via the #PersistenceContext
annotation only (not #PersistenceUnit)
-- You are not allowed to use #PersistenceUnit to refer to a unit of
type TRANSACTION
-- The EntityManager given by the container is a reference to the
PersistenceContext/Cache associated with a JTA Transaction.
-- If no JTA transaction is in progress, the EntityManager cannot be
used because there is no PersistenceContext/Cache.
-- Everyone with an EntityManager reference to the same unit in the
same transaction will automatically have a reference to the same
PersistenceContext/Cache
-- The PersistenceContext/Cache is flushed and cleared at JTA
commit time
Cache == PersistenceContext
The concept of a database cache is an extremely important concept to be
aware of. Without a copy of the data in memory (i.e. a cache) when you
call account.getBalance() the persistence provider would have to go read
the value from the database. Calling account.getBalance() several times
would cause several trips to the database. This would obviously be a big
waste of resources. The other side of having a cache is that when you call
account.setBalance(5000) it also doesn't hit the database (usually). When
the cache is "flushed" the data in it is sent to the database via as many
SQL updates, inserts and deletes as are required. That is the basics of
java persistence of any kind all wrapped in a nutshell. If you can
understand that, you're good to go in nearly any persistence technology
java has to offer.
Complications can arise when there is more than one
PersistenceContext/Cache relating the same data in the same transaction.
In any given transaction you want exactly one PersistenceContext/Cache for
a given set of data. Using a TRANSACTION unit with an EntityManager
created by the container will always guarantee that this is the case. With
a RESOURCE_LOCAL unit and an EntityManagerFactory you should create and use
exactly one EntityManager instance in your transaction to ensure there is
only one active PersistenceContext/Cache for the given set of data active
against the current transaction.

Spring Bean Hangs on Method with #Transactional

Just a little background , I'm a new developer who has recently taken over a major project after the senior developer left the company before I could develop a full understanding of how he structured this. I'll try to explain my issue the best I can.
This application creates several MessageListner threads to read objects from JMS queues. Once the object is received the data is manipulated based on some business logic and then mapped to a persistence object to be saved to an oracle database using a hibernate EntityManager.
Up until a few weeks ago there hasn't been any major issues with this configuration in the last year or so since I joined the project. But for one of the queues (the issue is isolated to this particular queue), the spring managed bean that processes the received object hangs at the method below. My debugging has led me to conclude that it has completed everything within the method but hangs upon completion. After weeks of trying to resolve this I'm at end of my rope with this issue. Any help with this would be greatly appreciated.
Since each MessageListner gets its own processor, this hanging method only affects the incoming data on one queue.
#Transactional(propagation = Propagation.REQUIRES_NEW , timeout = 180)
public void update(UserRelatedData userData, User user,Company company,...)
{
...
....
//business logic performed on user object
....
......
entityMgr.persist(user);
//business logic performed on userData object
...
....
entityMgr.persist(userData);
...
....
entityMgr.flush();
}
I inserted debug statements just to walk through the method and it completes everything including entityMgr.flush.().
REQUIRES_NEW may hang in test context because the transaction manager used in unit testing doesn't support nested transactions...
From the Javadoc of JpaTransactionManager:
* <p>This transaction manager supports nested transactions via JDBC 3.0 Savepoints.
* The {#link #setNestedTransactionAllowed "nestedTransactionAllowed"} flag defaults
* to {#code false} though, since nested transactions will just apply to the JDBC
* Connection, not to the JPA EntityManager and its cached entity objects and related
* context. You can manually set the flag to {#code true} if you want to use nested
* transactions for JDBC access code which participates in JPA transactions (provided
* that your JDBC driver supports Savepoints). <i>Note that JPA itself does not support
* nested transactions! Hence, do not expect JPA access code to semantically
* participate in a nested transaction.</i>
So clearly if you don't call (#Java config) or set the equivalent flag in your XML config:
txManager.setNestedTransactionAllowed(true);
or if your driver doesn't support Savepoints, it's "normal" to get problem with REQUIRES_NEW...
(Some may prefer an exception "nested transactions not supported")
This kind of problems can show up when underlying database has locks from uncommitted changes.
What I would suspect is some other code made inserts/deletes on userData table(s) outside transaction or in a transaction which takes very long time to execute since it's a batch job or similar. You should analyze all the code referring to these tables and look for missing #Transactional.
Beside this answer, you may also check for the isolation level of your transaction — perhaps it's too restrictive.
Does the update() method hang forever, or does it throw an exception when the timeout elapses?
Unfortunately I have the same problem with Propagation.REQUIRES_NEW. Removing it resolves the problem. The debugger shows me that the commit method is hanging (invoked from #Transactional aspect implementation).
The problem appears only in the test spring context, when the application is deployed to the application server it works fine.

JPA2 Entities Caching

As it stands I am using a JSF request scoped bean to do all my CRUD operations. As I'm sure you most likely know Tomcat doesn't provide container managed persistence so in my CRUD request bean I am using EnityManagerFactory to get fold of enity manager. Now about the validity of my choice to use request scoped bean for this task, it's probably open for a discussion (again) but I've been trying to put it in the context of what I've read in the articles you gave me links to, specifically the first and second one. From what I gather EclipseLink uses Level 2 cache by default which stored cached entity. On ExlipseLink Examples - JPA Caching website it says that:
The shared cache exists for the duration of the persistence unit ( EntityManagerFactory, or server)
Now doesn't that make my cached entities live for a fraction of time during the call that is being made to the CRUD request bean because the moment the bean is destroyed and with it EntityManagerFactory then so is the cache. Also the last part of the above sentence "EntityManagerFactory, or server" gets me confused .. what precisely is meant by or server in this context and how does one control it. If I use the #Cache annotation and set appropriate amount of expire attribute, will that do the job and keep the entities stored on the servers L2 cache than, regardless of whether my EntityManagerFactory has been destroyed ?
I understand there is a lot of consideration to do and each application has specific requirements . From my point of view configuring L2 cache is probably the most desirable (if not only, on Tomcat) option to get things optimized. Quoting from your first link:
The advantages of L2 caching are:
avoids database access for already loaded entities
faster for reading frequently accessed unmodified entities
The disadvantages of L2 caching are:
memory consumption for large amount of objects
stale data for updated objects
concurrency for write (optimistic lock exception, or pessimistic lock)
bad scalability for frequent or concurrently updated entities
You should configure L2 caching for entities that are:
read often
modified infrequently
not critical if stale
Almost all of the above points apply to my app. At the heart of it, amongst other things, is constant and relentless reading of entities and displaying them on the website (the app will serve as a portal for listing properties). There's also a small shopping cart being build in the application but the products sold are not tangible items that come as stock but services. In this case stale entities are no problem and also, so I think, isn't concurrency as the products (here services) will never be written to. So the entities will be read often, and they will be modified infrequently (and those modified are not part of the cart anyway, an even those are modified rarely) and therefore not critical if stale. Finally the first two points seem to be exactly what I need, namely avoidance of database access to already loaded entities and fast reading of frequently accessed unmodified enteties. But there is one point in disadvantages which still concerns me a bit: memory consumption for large amount of objects. Isn't it similar to my original problem?
My current understanding is that there are two options, only one of which applies to my situation:
To be able to delegate the job of longer term caching to the persistence layer than I need to have access to PersistenceContext and create a session scoped bean and set PersistenceContextType.EXTENDED. (this options doesn't apply to me, no access to PersistenceContext).
Configure the L2 #Cache annotation on entities, or like in option 1 above create a session scoped bean that will handle long term caching. But aren't these just going back to my original problem?
I'd really like to hear you opinion and see what do you think could be a reasonable way to approach this, or perhaps how you have been approaching it in your previous projects. Oh, and one more thing, just to confirm.. when annotating an entity with #Cache all linked entities will be cached along so I don't have to annotate all of them?
Again all the comments and pointers much appreciated.
Thanks for you r answer .. when you say
"In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server."
Does it mean I could for example declare and initialize static EntityManagerFactory field in an application scoped been to be later used by all the beans throughout the life of the application ?
EclipseLink uses a shared cache by default. This is shared for all EntityManagers accessed from an EntityManagerFactory. You do not need to do anything to enable caching.
In general, you do not want to be creating a new EntityManagerFactory per request, only a new EntityManager. Creating a new EntityManagerFactory is quite expensive, so not a good idea, even ignoring caching (it has its own connection pool, must initialize the meta-data, etc.).
In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server. Either never close it, or close it when a Servlet is destroyed.

Resources