Spring Transaction Management for files - spring

I am trying to reset the state of a project using config files. The idea is to delete everything that is in memory and then read the files again. I have several files that contain different configs that end up being objects in memory. Now I want the whole operation to be transactional in the sense that if one read fails then the process stops:
public void reset(){
Object config1 = helper.readFile1();
Object config2 = helper.readFile2();
//This is the part that should be transactional
repository.deleteAll();
repository.saveConfig1(config1);
repository.saveConfig2(config2);
}
Repository holds some data in memory. Delete cleans up this data. Save adds data to memory.
If saveConfig1 fails then the whole set of operations on memory should be rolled back.
So far I have read about Spring's Platform Transaction Manager:
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/transaction/PlatformTransactionManager.html
However the different implementations are all tailored for differente database related techonologies, not objects in memory. Is there any way to have transactional behavior when using objects that hold data in memory?

Related

Cache Object Evicting by JMS Message Listener Notifier

Suppose there are 2 Java EE applications. The second application depends on the first application as jar and war. I am using a Class extending JMS MessageListener and onMessage(). I send the Object that is modified in the first application. When receiving the JMS message from the second application I evict the object and refresh the cache. But the child #OneToMany is not being evicted. What would be the problem?
I read that if the Cascade.ALL is put as annotation on the child so when parent is evicted, child is also. What would be the problem?
I don't think there is a problem. When you evict an object from a remote cache, there is nothing to say that the OneToMany relationship within the cached object was even loaded. It cannot know which referenced entities to invalidate without loading that relationship, and if it does have state, it might miss the newly added entities if that OneToMany was modified in the remote cache.
If you want to cascade eviction, you'll have to send an eviction message that includes the graph on your own. Your application should know and have loaded the graph and can send out the classes and IDs associated to the remote servers, avoiding them from any need to hit the database to load things just to ensure they are evicted. Doing so would entirely depend on your reasoning for evicting the graph though, as it seems like it would be an unnecessary burden: many leaf nodes in graphs generally remain unchanged, and if they are, have their own cache eviction policies that would evict them from remote caches.

Spring JPA: keeping the persistence context small

How can I keep the persistence context small in a Spring JPA environment?
Why: I know that by keeping the persistence context small, there will be a significant performance boost!
The main problem area is:
#Transactional
void MethodA() {
WHILE retrieving next object of 51M (via a stateless session connection) DO
get some further (readonly) data
IF condition holds THEN
assessment = retrieve assession object (= record from database)
change assessment data
save the assessment to the database
}
Via experiments in this problem domain I know that when cleaning the persistence context every 250 iterations, then the performance will be a lot better.
When I add these lines to the code, so every 250 iterations:
#PersistenceContext
private EntityManager em;
WHILE ...
...
IF counter++ % 250 == 0 THEN
em.flush()
em.clear()
}
Then I get errors like "cannot reliably perform the flush operation".
I tried to make the main Transactional read-only and the asssessment-save part 'Transaction-requires-new', then I get errors like 'operating on a detached entity'. Very strange, because I never revisit an open entity.
So, how can I keep the persistence context small?
Have tried 10s of ways. Help is really appreciated.
I would suggest you move all the condition logic into your query so that you don't even have to load that many rows/objects. Or even better, write an update query that does all of that in a single transaction so you don't need to transfer any data at all between your application and database.
I don't think that flushing is necessary with a stateless session as it doesn't keep state i.e. it flushes and clears the persistence context after every operation, but apart from that, I also think this this might not be what you really want as that could lead to re-fetching of data.
If you don't want the persistence context to fill up, then use DTOs for fetching the data and execute update statements to flush the changes.

Problems with Spring and Hibernate SessionFactory: Domain object scope restricted to session

I have been using the session factory (Singleton Bean injected into the DAO objects) in my Spring/Hibernate application, I am using the service layers architecture, and I have the following issue:
Anytime I get a domain object from the database, it uses a new session provided by the hibernate session factory. In the case of requesting several times the same row, this leads to having multiple instances of that same domain object. (If using a single session, it would return multiple objects pointing to the same reference) Thus, any changes made to one of those domain object is not taken into account by the other domain objects representing this same row.
I am developing a SWING application with multiple views and I get the same DB row from different locations (And queries), and I thus need to obtain domain objects pointing to the same instance.
My question is then, Is it a way to make this happen using the SessionFactory? If not, is it a good practice to use a single session for my whole application? In that case, how and where should I declare this session? (Should it be a bean injected into the DAO objects just like the sessionFactory?)
Thank you in advance for your help
Hibernate session (I will call it h-session) in Spring usually bound to thread (see JavaDoc for HibernateTransactionManager), so h-session acquired once per thread.
First level cache (h-session cache - always turned on) used to retrieve same object if you call "get" or "load" several times on one h-session. But this cache doesn't work for queries.
Also, you shouldn't forget about problems related to transaction isolation. In most applications "Read committed" isolation level is used. And this isolation level affected by phenomenon known as "non-repeatable reads". Basically, you could receive several versions of the same row in one transaction if you query for this row several times (because row could be updated between queries in another transaction).
So, you shouldn't query several times for same data in one h-session/transaction.
You're looking for the Open Session in View Pattern. Essentially, you want to bind a Session to your thread on application startup and use the same Session throughout the lifetime of the application. You can do this by creating a singleton util class which keeps a session like so (note that the example I have uses an EntityManager instead of a Session, but your code will be essentially the same):
private static EntityManager entityManager;
public static synchronized void setupEntityManager() {
if (entityManager == null) {
entityManager = entityManagerFactory.createEntityManager();
}
if (!TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.bindResource(entityManagerFactory, new EntityManagerHolder(entityManager));
}
}
public static synchronized void tearDownEntityManager() {
if (entityManager != null) {
if (entityManager.isOpen()) {
entityManager.close();
}
if (TransactionSynchronizationManager.hasResource(entityManagerFactory)) {
TransactionSynchronizationManager.unbindResource(entityManagerFactory);
}
if (entityManagerFactory.isOpen()) {
entityManagerFactory.close();
}
}
}
Note that there are inherent risks associated with the Open Session in View pattern. For example, I noticed in the comments that you intend to use threading in your application. Sessions are not threadsafe. So you'll have to make sure you aren't trying to access the database in a threaded manner.*
You'll also have to be more aware of your fetching strategy for collections. With an open session and lazy loading there's always the chance that you'll put undue load on your database.
*I've used this approach in a NetBeans application before, which I know uses threading for certain tasks. We never had any problems with it, but you need to be aware of the risks, of which there are many.
Edit
Depending on your situation, it may also be possible to evict your domain objects from the Session and cache the detached objects for later use. This strategy would of require that your domain objects not be updated very often, otherwise your application would become unnecessarily complicated.

Spring,Hibernate - Batch processing of large amounts of data with good performance

Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?
Example 1# (with bad performance) :
#Singleton
public ServiceBean {
procesAllData(){
List<Entity> entityList = dao.findAll();
for(...){
process(entity);
}
}
private void process(Entity ent){
//data processing
//saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
}
}
What could be improved here ?
In my opinion :
I would set hibernate batch size (see hibernate documentation for batch processing).
I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.
Do you agree ? Any tips ?
Here are 2 basic strategies:
JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.
You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.
There are a few things to keep in mind here:
Loading all entites into memory with a findAll method can lead to OOM exceptions.
You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.
Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.
The other alternative is to use a stateful session but clear the session at regular intervals as shown here.
I hope this is useful advice.

How to use a memory cache in a concurrency critical context

Consider the following two methods, written in pseudo code, that fetches a complex data structure, and updates it, respectively:
getData(id) {
if(isInCache(id)) return getFromCache(id) // already in cache?
data = fetchComplexDataStructureFromDatabase(id) // time consuming!
setCache(id, data) // update cache
return data
}
updateData(id, data) {
storeDataStructureInDatabase(id, data)
clearCache(id)
}
In the above implementation, there is a problem with concurrency, and we might end up with outdated data in the cache: consider two parallel executions running getData() and updateData(), respectively. If the first execution fetches data from the cache exactly in between the other execution's call to storeDataStructureInDatabase() and clearCache(), then we will get an outdated version of the data. How would you get around this concurrency problem?
I considered the following solution, where the cache is invalidated just before data is committed:
storeDataStructureInDatabase(id, data) {
executeSql("UPDATE table1 SET...")
executeSql("UPDATE table2 SET...")
executeSql("UPDATE table3 SET...")
clearCache(id)
executeSql("COMMIT")
}
But then again: If one execution reads the cache in between the other execution's call to clearCache() and COMMIT, then an outdated data will be fetched to the cache. Problem not solved.
In the cache way of thinking you cannot prevent retrieving outdated data.
For example, when someone start sending an HTTP request (if your application is a web application) that will later render the cache invalid, should we consider the cache invalid when the POST request start? when the request is handled by your server? when you start the controller code?. Well no. In fact the cache is invalid only when the database transaction ends. Not even when the transaction start, only at the end, on the COMMIT phase of the transaction. And any working process working with previous data has very few chances of being aware that the data as changed, in a web application what about html pages showing outdated data in a browser, do you want to flush theses pages?
But let's just think your parallel process are not just there for the web, but for real concurrency critical parallel jobs.
One problem is that your cache is not handled by the database server, so it's not in the transaction COMMIT/ROLLBACK. You cannot decide to clear the cache first but rebuild it if you rollback. So you can only clear and rebuild the cache after the transaction is commited.
And that lead the possibility to get an outdated version of the cache if your get comes between the database commit and the cache clear instruction. So :
is it really important that you have an outdated version of the cache? Let's say your parallel process made something just a few milliseconds before you would have retrieve this new version (so it's the old one) and work with it for maybe 40ms, and then build final report on that without noticing the cache have been flush 15ms before the end of the work. If your process response cannot contain any outdated data, then you'll have to check data validity before outputing it (so you should recheck that all data used in the work process are still valid at teh end).
So if you don't want to recheck data validity that mean your process should have put some lock (semaphore?) when starting and should release the lock only at the end of the work, your are serializing your work. Databases can speed up serialization by working on pseudo-serialization levels for transactions and breaking your transaction if any changes make this pseudo-serialization hasardous. But here you're not only working with a database so you should do the serialization on your own side.
Process serialization is slow, but you may try to do the same as the database, that is runing jobs in parallel and invalidating any job running when data is altered (so having something that detect your cache clear and kill and rerun all existing parallel jobs, implying you have something mastering all the parallel jobs)
or simply accept you can have small past-invalid-outdated data. If we talk of web application the time your response walks on TCP/IP to the client browser it may be already invalid.
Chances are that you will accept to work with outdated cache data. The only really important point is that if you cannot trust your cache data for a really critical thing then you should'nt use a cache for that. If your are manipulating Accounting data for example. The only way to get a serialization of parallel tasks is to do:
in the Writing process: all the important reads (the one that will get some writes) and all the write things in a transaction with a high isolation level (level 4) and with all necessary row locks. That's something hard to do working only with a database, it's quite impossible if you add an external cache for read operations.
in parallel read process: do what you want (read from external cache), if the read data won't be used for write operations. If one of the read data will later be use for a write operation this data validity will have to be checked in the write transaction (so in the Writing process). Why not adding a timestamp watermark on the data, so that when it will come back for a write operation you'll be able to know if it is still valid.

Resources