Spring data jpa save huge entity list async - spring

I have a huge entity list say having 10000 items, I want to use the crud repository to save the list async and the api will return without waiting for the save result (because the save can take long time). Is it possible to use #async annotation to do so?

Yes you can use #Async for committing to the database. However, keep in mind that the commit will then run in a separate thread with it's own transaction. You will not see any intermediate results until the commit transaction finishes.

Related

Spring jpa performance the smart way

I have a service that listens to multiple queues and saves the data to a database.
One queue gives me a person.
Now if I code it really simple. I just get one message from the queue at a time.
I do the following
Start transaction
Select from person table to check if it exists.
Either update existing or create a new entity
repository.save(entity)
End transaction
The above is clean and robust. But I get alot of messages its not fast enough.
To improve performance I have done this.
Fetch 100 messages from queue
then
Start transaction
Select all persons where id in (...) in one query using ids from incomming persons
Iterate messages and for each one check if it was selected above. If yes then update it if not then create a new
Save all changes with batch update/create
End transaction
If its a simple message the above is really good. It performs. But if the message is complicated or the logic I should do when I get the message is then the above is not so good since there is a change some of the messages will result in a rollback and the code becomes hard to read.
Any ideas on how to make it run fast in a smarter way?
Why do you need to rollback? Can't you just not execute whatever it is that then has to be rolled back?
IMO the smartest solution would be to code this with a single "upsert" statement. Not sure which database you use, but PostgreSQL for example has the ON CONFLICT clause for inserts that can be used to do updates if the row already exists. You could even configure Hibernate to use that on insert by using the #SQLInsert annotation.

Spring REST - Concurrent requests on database from differents APIs

I have two Frontends consuming JSON from two different Backends using the JSON Web Token. These backends act on the same database.
In the db for example I have the Driver, Customer and Trip tables. The customer or the driver can cancel a trip only if it has not been canceled beforehand by one of them. Some transactions are recorded during a cancellation.
How to prevent having a double execution in this case when simultaneously, the customer and the driver launch a request for trip cancellation?
Am usin' Spring Boot (RESTful) and Spring JPA.
Any help will be greatly appreciated.
Edit:
Assuming these backends are A & B, Customer is requesting cancellation from the backend A, and Driver from B.
Use optimistic locking. Your code would look as follows:
#Entity
public class Trip {
#Version
#NotNull
private Long version;
...
}
It works as follows. Each change modifies the version. Suppose two users (or two services) loaded the same version of the trip. Now they both try to cancel it, i.e. they both try to modify it. Besides changes they both send the version. The JPA checks if the version in the update statement is the same as in the database. So the first request wins and will be executed. During the execution the version will be incremented.
Now the 2nd request arrives and wants also to cancel the trip. The JPA will see that the version attribute in the update statement is older (less) than the version value in database. Thus the 2nd request will not be executed and an OptimisticLockException will be thrown.
You can catch this exception and inform the user that the data were change in the meanwhile and suggest user to reload the data. The user reloads the data and sees that the trip has already been cancelled.

Spring Data Query Execution Optimization: Parallel Execution of Hibernate #Query Method in JpaRepository

I have a Dashboard view, which requires small sets of data from tables all over the database. I optimized the database queries (e.g. removed sub-queries). There are now ~20 queries which are executed one after the other, and which are fetching different data sets from the database. Most of the HQL queries contain GROUP BY and JOIN clauses. With a Spring REST interface, the result is returned to the front-end.
How do I optimize the execution of the custom queries? My initial thought was to run the database queries in parallel. But how do I achieve that? After doing some research I found the annotation #Async which makes it possible to run methods in parallel. But does this work with Hibernate methods? Is there always a new database session created for every method annotated with #Query in a JpaRepository? Does running a database query have an effect on the overall execution time after all?
Another way to run the database calls in parallel, is splitting the Dashboard call into several single Ajax calls (every concern gets its own Ajax call). I didn't want to do that, because every time the dashboard is opened (or e.g. the date range is changed), another 20 Ajax calls are made to fetch the new data. And the same question remains: Does running SQL queries in parallel have an effect on the execution time of the database?
I currently did not yet add additional indices to the database. This will be the next thing, I definitely will be doing. However, I'm interested on the performance impacts of running the queries in parallel and on how to achieve this programmatically with Spring.
My project was initially generated by jHipster (Spring Boot, MariaDB, AngularJS etc.)
First, running these SQLs in parallel will not impact the database and it will only make the page load faster, so the design should focus on that.
I am posting this answer assuming that you have already made sure that you cannot combine these 20 SQLs because the data is unrelated (no joins, views, etc).
I would advise against using #Async for 2 reasons.
Reason 1 - An asynchronous task is great when you want to fire a bunch of tasks and forget, or when you know when all the tasks will be complete. So you will need to "wait" for all your asynchronous tasks to complete. How long should you wait? Until the slowest query is done?
Check this sample code for Async (from the guides # spring.io --https://spring.io/guides/gs/async-method/)
// Wait until they are all done
while (!(page1.isDone() && page2.isDone() && page3.isDone())) {
Thread.sleep(10); //10-millisecond pause between each check
}
Will/should your service component wait on 20 Async DAO queries?
Reason 2 - Remember that Async is just spawning off the task as a thread. Since you are going to work with JPA, remember Entity managers are not thread-safe. And DAO classes will propagate transactions. Here is an example of problems that may crop up - http://alexgaddie.blogspot.com/2011/04/spring-3-async-with-hibernate-and.html
IMHO, it is better to go ahead with multiple Ajax calls, because that will make your components cohesive. Yes, you will have 20 endpoints, but they would have a simpler DAO, simpler SQL, easily unit testable and the returned data structure will be easier to handle/parse by the AngularJS widgets. When the UI triggers all 20 Ajax calls, the dashboard would be loading individual widgets when they are ready, instead of loading all of them at the same time. This will help you extend your design in future by optimizing the slower loading sections of your dashboard (maybe caching, indexing, etc).
Bunching your DAO calls will only make the data structure complex and unit testing harder.
Normally it will be much faster to execute the queries in parallel. If you are using Spring data and do not configure anything specific your JPA provider (Hibernate) will create a connection pool that stores connections to your data base. I think by default Hibernate holds 10 connections and by doing so it is prepared to do 10 queries in parallel. How much faster the queries are by running them in parallel depends on the database and the structure of the tables / queries.
I think that using #Async is not the best practice here. Defining 20 REST endpoints that provides the result of a specific query is a much better approach. By doing so you can simple create the Entity, Repository and RestEndpoint class for each query. By doing so each query is isolated and the code is less complex.

Spring Transaction propagation: can't get the #OneToMany related entities when using the same transaction for creation and consultation operation

I have the following problem: I am working on a spring-boot application which offers REST services and use a relational (SQL) database using spring-data-jpa.
I have two REST services:
- a entity-creation service, which create the child-entity, the parent-entity and associate them in a same transaction. When this service ends, the data are committed into the database.
- an entity consultation service, which get back the parent-entity with its children
These two services are annotated with the #Transactional annotation. It production case, it works well: I can create an parent-entity with its children in one transaction (which is commited/ended), and get it in another transaction latter.
The problem is when I want to create integration-tests. My idea was to annotate each test with the #Transactional annotation, and do a rollback after each test. This way I keep my database clean between each test, and I don't have a generate the schema again or clean all the records in the database.
The integration test consists in creating a parent and its children and then reading it, everything in one transaction (as the test is annotated with #Transaction). When reading the entity previously created in the same transaction, I can get the parent entity, but the children are not fetched (null value). I am not sure to understand very well the transaction mechanism: I was thinking that using the #Transactional on the test method, the services (annotated with "#Transactional") invoked by this test should detect and use the same transaction opened by the test method (the propagation is configured to "REQUIRED"). Hence as the transaction uses the same EntityManager, this one should be able to return the relation between the parent entity and its children created previously in the same transaction, even if the data has not been committed to the database. The strange thing is that it retrieve the parent entity (which has not been yet committed into the database), but not its children. Is my understanding of the transaction concept correct? If not, could someone explains me what am I missing?
Also, if someone did something similar, could he explain me how he did it please?
My code is quite complex. I first want to know if I understand well how are transaction managed and if someone already did something similar. If really it is required, I can send more information about my implementation (how the transaction-manager and the entity-manager are initialized, the JPA entities, the services etc...)
Binding the Entity-manager in my test and calling its flush method from my test,between the creation and the reading, the reading operation works well: I get the parent entity with its children. But the data are written into the database during the creation to read it latter during the read operation. And I don't want the transaction to be committed as I need my test to work on an empty database. My misunderstanding is not so much about the Transaction mechanism, but more about the entity-manager: it does not keep as a cache the entities created and theirs relations...
This post help me.
Issue with #Transactional annotations in Spring JPA
As a final word, I am thinking about calling an SQL script before each test to empty my database.

Customize Spring's JdbcBatchItemWriter to use different SQL for every record

I have a requirement where I will receive a flat file from a vendor and I need to read the records and insert/update/delete them in my DB table. I get the action flag from vendor indicating whether I need to insert/update/delete that particular record. The flat file will contain huge records and I do not want to do manual steps like checking the action flag for every record [by overriding write() method of ItemWriter and looping the items list in chunk] and construct sql manually and use JDBCTemplate to do the DB operation for every record.
Can I achieve this using JdbcBatchItemWriter? Is there a way to set the sql for every record in the chunk so that Spring Batch will do a batch update? How does the ItemPreparedStatementSetter can be invoked in that case?
Since your choice is at the record level, take a look at the ClassifierCompositeItemWriter (http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html). That ItemWriter implementation takes a Classifier implementation that it uses to determine which ItemWriter to use. From there, you can configure one ItemWriter that does inserts, one for updates, and one for deletes. Each record will be funneled through to the correct instance and assuming your delegates are JdbcBatchItemWriters, you'll get the same batching you normally do (one batch for inserts, one for updates, and one for deletes).

Resources