Hibernate Envers with QueryDSL Update - spring

Hibernate, Hibernate Envers and QueryDSL are configured and working correctly in a Spring boot 1.4.1.RELEASE.
The problem is when using UpdateClause<JPAUpdateClause> updateQueryBuilder = queryFactory.update(collectionTransaction); to build update query and execute that update query, Hibernate Envers does not pick up and audit those changes.
Following is the Spring Data JPA repository that implements QueryDSL
public class CollectionTransactionRepositoryImpl extends QueryDslRepositorySupport implements CollectionTransactionRepositoryCustom {
#Autowired
private JPAQueryFactory queryFactory;
public CollectionTransactionRepositoryImpl() {
super(CollectionTransaction.class);
}
#Override
public Collection<CollectionTransaction> updateCollectionTransaction(UpdateCollectionTransaction updateCollectionTransaction) {
QCollectionTransaction collectionTransaction = QCollectionTransaction.collectionTransaction;
UpdateClause<JPAUpdateClause> updateQueryBuilder = queryFactory.update(collectionTransaction);
.....//Code omitted for brevity
long updated = updateQueryBuilder.execute();
//.....
return ...
}
}
Is it possible for Hibernate Envers to pick up changes in this situation ?

This is a known concern outlined in JIRA HHH-10318.
Envers works based on Hibernate's event subsystem where Hibernate effectively notifies various callbacks that state for an entity has been modified in some way, and provides both the prior and new entity state. This state is precisely what Envers uses to determine what changed and insert audit change rows.
Lets take a trivial example:
UPDATE MyEntity e SET e.status = :status
Hibernate will perform the following tasks:
Flush the persistence context any any modifications.
Invalidate any cached instances of MyEntity.
Execute the bulk update operation.
No where in any of these steps does Hibernate load any existing state. It simply guarantees that current changes are flushed prior to the bulk update and that any subsequent operations will fetch from the datastore rather than a cache due to the bulk update.
Therefore from Envers perspective, it gets no callbacks and thus isn't aware that any operation took place because Hibernate ORM cannot provide any entity state for such an operation, it simply does not exist.
The big question here is how (if possible) to model and handle a change unit for such an operation.
It's difficult because Envers would effectively need some type of PreBulkOpEvent so that it can cache what it needs that is about to change and a PostBulkOpEvent to require and merge the two results to generate change log entries. The concern with such a concept really centers around how to do this effectively to avoid
Running out of memory due to large result-set manipulation.
Long execution time to load state from datastore for large result-set manipulation.
Anyway, you're welcomed to read over the JIRA and provide any feedback or ideas. But presently, its just something that falls outside the scope of what we can capture at the present moment.

Related

How to get updated objects after flush() in the same transaction (Hibernate/ Spring boot)

I have a list of ~10 000 objects.
I am trying to call an mysql update query (procedure) and then to get the updated objects inside same transaction.
Can this be achieved ?
When I call a delete statement + flush(), hibernate retrieves me correct objects (deleted objects are missing)
But when I try update statement + flush(), hibernate retrieves me the initial unchanged objects.
#Transactional
void test() {
//...
em.createQuery("delete from StorefrontProduct sp where sp in (:storefrontProducts)")
.setParameter("storefrontProducts", storefrontProductsToDelete)
.executeUpdate();
// example
em.createQuery("update StorefrontProduct sp set sp.orderIndex=0 where sp.id=90")
.executeUpdate();
em.flush();
//Simple JPA query
List<StorefrontProduct> result = repository.findAllByPreviousOrderIndexIsNotNull();
//additional code....
}
After running the code from above and putting a breakpoint after findAll call, provided objects from 1-st query were deleted and flushed, but the update query was not flushed.
That is known counterintuitive behaviour of Hibernate.
First of all, em.flush() call might be superfluous if flush mode set to AUTO (in that case Hibernate automatically synchronises persistence context (session-level cache) with underlying database prior executing update/delete queries).
Delete and successive Select case:
you issues delete then select, since select does not see deleted records anymore you do not see deleted records in resultset, however if you call findById you may find deleted records.
Update and successive Select case:
you issues update then select, when processing resultset Hibernate sees both records stored in database and records stored in persistence context and it assumes that persistence context is a source of truth, that is the reason why you see "stale" data.
There are following options to mitigate that counterintuitive behaviour:
do not perform direct updates, use "slow" find/save API instead
either detach or refresh stale entities after direct update, em.clear() may also help, however it completely cleans up persistence context, which might be undesirable

Technical difference between Spring Boot with JOOQ and Spring Data JPA

When would you use Spring Data JPA over Spring Boot with JOOQ and vice versa?
I know that Spring Data JPA can be used for completing basic CRUD queries, but not really for complex join queries while using JOOQ makes it easier?
EDIT: Can you use both Spring data jpa with jooq?
There is no easy answer to your question. I have given a couple of talks on that topic. Sometimes there are good reasons to have both in a project.
Edit: IMHO Abstraction over the database in regards of dialects and datatypes is not the main point here!! jOOQ does a pretty good job to generate SQL for a given target dialect - and so does JPA / Hibernate. I would even say that jOOQ goes an extra mile to emulate functions for databases that don't have all the bells and whistles like Postgres or Oracle.
The question here is "Do I want to be able to express a query myself with everything SQL has to offer or am I happy with what JPA can express?"
Here's an example to run both together. I have a Spring Data JPA provided repository here with a custom extension (interface and implementation are necessary). I let the Spring context inject both the JPA EntityManager as well as the jOOQ context. I then use jOOQ to create queries and run them through JPA.
Why? Because expressing the query in question is not possible with JPA ("Give me the thing I listened the most" which is not the one having the highest number of count, but could be several).
The reason I run the query through JPA is simple: A downstream use case might require me to pass JPA entities to it. jOOQ can of course run this query itself and you could work on records or map the stuff anyway u like. But as you specifically asked about maybe using both technologies, I thought this is a good example:
import java.util.List;
import javax.persistence.EntityManager;
import javax.persistence.Query;
import org.jooq.DSLContext;
import org.jooq.Field;
import org.jooq.Record;
import org.jooq.SelectQuery;
import org.jooq.conf.ParamType;
import org.jooq.impl.DSL;
import org.springframework.data.repository.CrudRepository;
import static ac.simons.bootiful_databases.db.tables.Genres.GENRES;
import static ac.simons.bootiful_databases.db.tables.Plays.PLAYS;
import static ac.simons.bootiful_databases.db.tables.Tracks.TRACKS;
import static org.jooq.impl.DSL.count;
import static org.jooq.impl.DSL.rank;
import static org.jooq.impl.DSL.select;
public interface GenreRepository extends
CrudRepository<GenreEntity, Integer>, GenreRepositoryExt {
List<GenreEntity> findAllByOrderByName();
}
interface GenreRepositoryExt {
List<GenreWithPlaycount> findAllWithPlaycount();
List<GenreEntity> findWithHighestPlaycount();
}
class GenreRepositoryImpl implements GenreRepositoryExt {
private final EntityManager entityManager;
private final DSLContext create;
public GenreRepositoryImpl(EntityManager entityManager, DSLContext create) {
this.entityManager = entityManager;
this.create = create;
}
#Override
public List<GenreWithPlaycount> findAllWithPlaycount() {
final Field<Integer> cnt = count().as("cnt");
return this.create
.select(GENRES.GENRE, cnt)
.from(PLAYS)
.join(TRACKS).onKey()
.join(GENRES).onKey()
.groupBy(GENRES.GENRE)
.orderBy(cnt)
.fetchInto(GenreWithPlaycount.class);
}
#Override
public List<GenreEntity> findWithHighestPlaycount() {
/*
select id, genre
from (
select g.id, g.genre, rank() over (order by count(*) desc) rnk
from plays p
join tracks t on p.track_id = t.id
join genres g on t.genre_id = g.id
group by g.id, g.genre
) src
where src.rnk = 1;
*/
final SelectQuery<Record> sqlGenerator =
this.create.select()
.from(
select(
GENRES.ID, GENRES.GENRE,
rank().over().orderBy(count().desc()).as("rnk")
).from(PLAYS)
.join(TRACKS).onKey()
.join(GENRES).onKey()
.groupBy(GENRES.ID, GENRES.GENRE)
).where(DSL.field("rnk").eq(1)).getQuery();
// Retrieve sql with named parameter
final String sql = sqlGenerator.getSQL(ParamType.NAMED);
// and create actual hibernate query
final Query query = this.entityManager.createNativeQuery(sql, GenreEntity.class);
// fill in parameter
sqlGenerator.getParams().forEach((n, v) -> query.setParameter(n, v.getValue()));
// execute query
return query.getResultList();
}
}
I spoke about this a couple of times. There is no silver bullet in those tech, sometimes it's a very thin judgement:
The full talk is here: https://speakerdeck.com/michaelsimons/live-with-your-sql-fetish-and-choose-the-right-tool-for-the-job
As well as the recorded version of it: https://www.youtube.com/watch?v=NJ9ZJstVL9E
The full working example is here https://github.com/michael-simons/bootiful-databases.
IMHO if you want a performing and maintainable application which uses a database at its core, you don't want to abstract away the fact that you are using a database. JOOQ gives you full control because you can read and write the actual query in your code but with type safety.
JPA embraces the OO model and this simply does not match the way a database works in all cases, which could result in unexpected queries such as N+1 because you put the wrong annotation on a field. If you are not paying enough attention this will lead to performance issues when scaling your application. JPA Criteria helps a bit but it still way harder to write and read.
As a result, with JPA you are first writing your query in SQL and then use half a day to translate it to Criteria. After years of working with both frameworks I would use JOOQ even for simple a CRUD application (because there is no such thing as a simple CRUD application :-)).
Edit: I don't think that you can mix JPA with JOOQ, question is, why would you want to? They are both using a different approach so just choose one. It's difficult enough to learn the intricacies of one framework.
Spring Data JPA gives you the following:
An ORM layer, allowing you to treat database tables as if they were Java objects. It allows you to write code that is largely database-agnostic (you can use MySQL, Oracle, SQL Server, etc) and that avoids much of the error-prone code that you get when writing bare SQL.
The Unit of Work pattern. One reason why you see so many articles on C# explaining what a unit of work is, and practically none for Java, is because of JPA. Java has had this for 15 years; C#, well, you never know.
Domain-driven design repositories. DDD is an approach to object-oriented software that does away with the anaemic domain model you so often see in database-driven applications, with entity object only having data and accessor methods (anaemic model), and all business logic in service classes. There's more to it, but this is the most important bit that pertains to Spring Data JPA.
Integration into the Spring ecosystem, with inversion of control, dependency injection, etc.
jOOQ, on the other hand, is a database mapping library that implements the active record pattern. It takes an SQL-centric approach to database operations, and uses a domain-specific language for that purpose.
As happens so often, there is no one correct or superior choice. Spring Data JPA works very well if you don't care about your database. If you're happy not to do any complicated queries, then Spring Data JPA will be enough. However, once you need to do joins between tables, you notice that a Spring Data JPA repository really isn't a very good match for certain operations.
As #michael-simons mentioned, combining the two can sometimes be the best solution.
Here's an official explanation when JOOQ fits:
https://www.jooq.org/doc/latest/manual/getting-started/jooq-and-jpa/
Just because you're using jOOQ doesn't mean you have to use it for everything!
When introducing jOOQ into an existing application that uses JPA, the
common question is always: "Should we replace JPA by jOOQ?" and "How
do we proceed doing that?"
Beware that jOOQ is not a replacement for JPA. Think of jOOQ as a
complement. JPA (and ORMs in general) try to solve the object graph
persistence problem. In short, this problem is about
Loading an entity graph into client memory from a database
Manipulating that graph in the client Storing the modification back to
the database As the above graph gets more complex, a lot of tricky
questions arise like:
What's the optimal order of SQL DML operations for loading and storing
entities? How can we batch the commands more efficiently? How can we
keep the transaction footprint as low as possible without compromising
on ACID? How can we implement optimistic locking? jOOQ only has some
of the answers. While jOOQ does offer updatable records that help
running simple CRUD, a batch API, optimistic locking capabilities,
jOOQ mainly focuses on executing actual SQL statements.
SQL is the preferred language of database interaction, when any of the
following are given:
You run reports and analytics on large data sets directly in the
database You import / export data using ETL You run complex business
logic as SQL queries Whenever SQL is a good fit, jOOQ is a good fit.
Whenever you're operating and persisting the object graph, JPA is a
good fit.
And sometimes, it's best to combine both
Spring Data JPA does support #Query idiom with the ability to run native queries (by setting nativeQuery flag) where we can write & see the query (simple & complex, with joins or otherwise) right there with the repository & reuse them easily.
Given the above,
When would you use Spring Data JPA over Spring Boot with JOOQ and vice versa?
I would simply use Spring Data JPA unless i am not using the Spring ecosystem itself. Other reason might be that i prefer the fluent style..
I know that Spring Data JPA can be used for completing basic CRUD queries, but not really for complex join queries
As i noted above, Spring Data JPA does provide the ability to use complex and/or join queries. In addition via the custom repository mechanism (example already in #Michael Simons post above that uses JOOQ) provides even more flexibility. So its a full fledged solution by itself.
Can you use both Spring data jpa with jooq?
Already answered wonderfully by #Michael Simons above.

Spring Transaction propagation: can't get the #OneToMany related entities when using the same transaction for creation and consultation operation

I have the following problem: I am working on a spring-boot application which offers REST services and use a relational (SQL) database using spring-data-jpa.
I have two REST services:
- a entity-creation service, which create the child-entity, the parent-entity and associate them in a same transaction. When this service ends, the data are committed into the database.
- an entity consultation service, which get back the parent-entity with its children
These two services are annotated with the #Transactional annotation. It production case, it works well: I can create an parent-entity with its children in one transaction (which is commited/ended), and get it in another transaction latter.
The problem is when I want to create integration-tests. My idea was to annotate each test with the #Transactional annotation, and do a rollback after each test. This way I keep my database clean between each test, and I don't have a generate the schema again or clean all the records in the database.
The integration test consists in creating a parent and its children and then reading it, everything in one transaction (as the test is annotated with #Transaction). When reading the entity previously created in the same transaction, I can get the parent entity, but the children are not fetched (null value). I am not sure to understand very well the transaction mechanism: I was thinking that using the #Transactional on the test method, the services (annotated with "#Transactional") invoked by this test should detect and use the same transaction opened by the test method (the propagation is configured to "REQUIRED"). Hence as the transaction uses the same EntityManager, this one should be able to return the relation between the parent entity and its children created previously in the same transaction, even if the data has not been committed to the database. The strange thing is that it retrieve the parent entity (which has not been yet committed into the database), but not its children. Is my understanding of the transaction concept correct? If not, could someone explains me what am I missing?
Also, if someone did something similar, could he explain me how he did it please?
My code is quite complex. I first want to know if I understand well how are transaction managed and if someone already did something similar. If really it is required, I can send more information about my implementation (how the transaction-manager and the entity-manager are initialized, the JPA entities, the services etc...)
Binding the Entity-manager in my test and calling its flush method from my test,between the creation and the reading, the reading operation works well: I get the parent entity with its children. But the data are written into the database during the creation to read it latter during the read operation. And I don't want the transaction to be committed as I need my test to work on an empty database. My misunderstanding is not so much about the Transaction mechanism, but more about the entity-manager: it does not keep as a cache the entities created and theirs relations...
This post help me.
Issue with #Transactional annotations in Spring JPA
As a final word, I am thinking about calling an SQL script before each test to empty my database.

In memory database, with hibernate and periodically persisting to an actual db

I would like to use an in memory db with hibernate, so my queries are super quick.
But moreover i would like to periodically persist that in memory state into a real mysql db.
Ofcourse the in memory database should load its initial content on startup from that mysql db.
Are there any good frameworks/practices for that purpose? (Im using spring) any tutorials or pointers will help.
I'll be honest with you, most decent databases can be considered in-memory to an extent given that they cache data and try not to hit the hard-disk as often as they can. In my experience the best in-memory databases are either caches, or alagamations of other data sources that are already persisted in some other form, and then are updated in a live fashion for time-critical information, or refreshed periodically for non-time-critical information.
Loading data from a cold start in to memory is potentially a lengthy process, but subsequent queries are going to be super-quick.
If you are trying to cache what's already persisted you can look at memcache, but in essence in memory databases always rely on a more persistent source, be it MySQL, SQLServer, Cassandra, MongoDB, you name it.
So it's a little unclear what you're trying to achieve, suffice to say it is possible to bring data in from persistent databases and have a massive in memory cache, but you need to design around how stale certain data can get, and how often you need to hit the real source for up-to-the-second results.
Actually the simplest would be to use some core Hibernate features for that, use the hibernate Session itself and combine it with the second level cache.
Declare the entities you want to cache as #Cacheable:
#Entity
#Cacheable
#Cache(usage = CacheConcurrencyStrategy.NON_STRICT_READ_WRITE)
public class SomeReferenceData { ... }
Then implement the periodically flushing like this, supposing you are using JPA:
open an EntityManager
load the entities you want to cache using that entity manager and no other
Keep the entity manager opened until the next periodic flush, Hibernate is keeping track what instances of SomeReferenceData where modified in-memory via it's dirty checking mechanism, but no modification queries are being issued.
Reads on the database are being prevented via the second level cache
When the moment comes to flush the session, just begin a transaction and commit immediately.
Hibernate will update modified entities in the database, update the second level cache and resume execution
eventually close the entity manager and replace it with a new one, if you want to reload from the database eveything
otherwise keep the same entity manager open
code example:
Try this code to see the overall idea:
public class PeriodicDBSynchronizeTest {
#Test
public void testSynch() {
// create the entity manager, and keep it
EntityManagerFactory factory = Persistence.createEntityManagerFactory("testModel");
EntityManager em = factory.createEntityManager();
// kept in memory due to #Cacheable
SomeReferenceData ref1 = em.find(SomeReferenceData.class, 1L);
SomeReferenceData ref2 = em.find(SomeReferenceData.class, 2L);
SomeReferenceData ref3 = em.find(SomeReferenceData.class, 3L);
....
// modification are tracked but not committed
ref1.setCode("005");
// these two lines will flush the modifications into the database
em.getTransaction().begin();
em.getTransaction().commit();
// continue using the ref data, and tracking modifications until the next request
...
}
}

JPA: Native Queries does not trigger execution of cached inserts/updates in same transaction

I have a JUnit Test where I set up the test data in the beginning of the test case, and then test for the conditions of the test case afterward in the same test method. The query that tests the conditions of the test case is a native query. I know that I have to explicitly call EntityManager.flush() in order for my inserts/updates to be written immediately to the DB since they are in the same transaction. Additionally, I noticed that I can replace entityManager.flush() with a JPA query, which seems to achieve the same thing. I have heard that JPA will cache DB operations in the same transaction until there is a need to execute them immediately, such as when select queries are issued. So this all makes sense. My question is, why doesn't this behavior also apply to native queries? Here my native query does not trigger the immediate execution of insert/updates in testSetup(), thus causing my assert to fail.
#Test
#Transactional
public void testCase() {
testSetup();
entityManager.flush(); // can be replaced with entityManager.createQuery("from Person");
List resultList = entityManager.createNativeQuery("select * from Person").getResultList();
Assert.assertTrue(resultList.size() == 1);
}
tl;dr - native queries bypass the persistence context and the cache.
This includes the queries you create by calling createNativeQuery, obviously. But bulk updates (UPDATE and DELETE), though expressed in JPQL are translated to native queries by the provider and bypass the persistence context and the cache as well.
Thus flushing or executing other queries will not have the expected effect.
Besides, if your native queries have changed the data on entities that are managed in the current persistence context or are cached, the entities will not be refreshed automatically and will become stale.

Resources