I am using Spring Batch and Partition to do parallel processing. Hibernate and Spring Data Jpa for db. For the partition step, the reader, processor and writer have stepscope and so I can inject partition key and range(from-to) to them. Now in processor, I have one synchronized method and expected this method to be ran once at time, but it is not the case.
I set it to have 10 partitions , all 10 Item reader read the right partitioned range. The problem comes with item processor. Blow code has the same logic I use.
public class accountProcessor implementes ItemProcessor{
#override
public Custom process(item) {
createAccount(item);
return item;
}
//account has unique constraints username, gender, and email
/*
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update.
But now it doesn't happen, instead findIfExist return null
and it try to do another insert of duplicate data
*/
private synchronized void createAccount(item) {
Account account = accountRepo.findIfExist(item.getUsername(), item.getGender(), item.getEmail());
if(account == null) {
//account doesn't exist
account = new Account();
account.setUsername(item.getUsername());
account.setGender(item.getGender());
account.setEmail(item.getEmail());
account.setMoney(10000);
} else {
account.setMoney(account.getMoney()-10);
}
accountRepo.save(account);
}
}
The expected output is that only 1 thread will run this method at any given time and so that there will be no duplicate inserttion in db as well as avoid DataintegrityViolationexception.
Actually result is that second thread can't find the first account and try to create a duplicate account and save to db, which will cause DataintegrityViolationexception, unique constraints error.
Since I synchronized the method, thread should execute it in order, second thread should wait for first thread to finish and then run, which mean it should be able to find the first account.
I tried with many approaches, like a volatile set to contains all unique accounts, do saveAndFlush to make commits asap, using threadlocal whatsoever, no of these works.
Need some help.
Since you made the item processor step-scoped, you don't really need synchronization as each step will have its own instance of the processor.
But it looks like you have a design problem rather than an implementation issue. You are trying to sychronize threads to act in a certain order in a parallel setup. When you decide to go parallel and divide the data into partitions and give each worker (either local or remote) a partition to work on, you must admit that these partitions will be processed in an undefined order and that there should be no relation between records of each partition or between the work done by each worker.
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update. But now it doesn't happen, instead findIfExist return null and it try to do another insert of duplicate data
That's because the transaction of thread1 may not be committed yet, hence thread2 won't find the record you think have been inserted by thread1.
It looks like you are trying to create or update some accounts with a partitioned setup. I'm not sure if this setup is suitable for the problem at hand.
As a side note, I would not call accountRepo.save(account); in an item processor but rather do that in an item writer.
Hope this helps.
Related
I am working on a spring application.
We have a specific requirement where when we get a specific event, we want to look it up in the DB. If we find the record in the DB, then we delete it from DB, create another event using the details and trigger it.
Now my concern is:
I do not want to use two different calls, one to find the record and another to
delete the record.
I am looking for a way where we can delete the record using a custom
query and simultaneously fetch the deleted record.
This saves two differnet calls to DB, one for fetch and another for delete.
What I found on the internet so far:
We can use the custom query for deletion using the annotation called #Modifying. But this does not allow us to return the object as a whole. You can only return void or int from the methods that are annotated using #Modifying.
We have removeBy or deleteBy named queries provided by spring. but this also returns int only and not the complete record object that is being deleted.
I am specifically looking for something like:
#Transactional
FulfilmentAcknowledgement deleteByEntityIdAndItemIdAndFulfilmentIdAndType(#Param(value = "entityId") String entityId, #Param(value = "itemId") String itemId,
#Param(value = "fulfilmentId") Long fulfilmentId, #Param(value = "type") String type);
Is it possible to get the deleted record from DB and make the above call work?
I could not find a way to retrieve the actual object being deleted either by custom #Query or by named queries. The only method that returns the object being deleted is deleteById or removeById, but for that, we need the primary key of the record that is being deleted. It is not always possible to have that primary key with us.
So far, the best way that I found to do this was:
Fetch the record from DB using the custom query.
Delete the record from DB by calling deleteById. Although, you can now delete it using any method since we would not be requiring the object being returned by deleteById. I still chose deleteById because my DB is indexed on the primary key and it is faster to delete it using that.
We can use reactor or executor service to run the processes asynchronously and parallelly.
In my spring boot application, I have parallel running multiple threads of following #Transactioanl method.
#Transactional
public void run(Customer customer) {
Customer customer = this.clientCustomerService.findByCustomerName(customer.getname());
if(customer == null) {
this.clientCustomerService.save(customer);
}
// another database oparations
}
When this running on multiple threads at the same time, since customer object will not be save until end of the transaction block, is there any possibility to duplicate customers in the database?
If your customer has an #Idfield which define a Primary Key column in Customer database, the database will throw you an exception like javax.persistence.EntityExistsException. Even if you run your code on multiple threads, at a point in time, maybe at the database level, only one will acquire a lock on the new inserted row. Also you must define #Version column/field at top entity level in order to use optimistic-locking. More details about this you can find here.
I have a working solution in place and i m initiating this thread to have a discussion on the best approach.
Environment : EF6, SQL 2012
Scenario:
I have Task and TaskDetail table which have parent child/relationship through TaskID.
Create Method:
While creating a task, i need to ensure an entry is made in TaskDetail table as well.
First approach:
An entry is made into Task Table. SaveChanges. Get the TaskID and assign into the DTO which has the information for Detail table. Pass the DTO to the TaskDetail create Method. Save changes. Commit.. If any error occurs, rollback entire transaction
Second Approach:
Add relavent fields of Task table. Add relevant fields of Task Detail table as well. Add the new detail table object to Task table through the navigation property. Task.Taskdetail.Add(newObj). Finally SaveChanges.
Question 1:
Both the approaches yield same SQL. Couldnt notice much difference though.. But what would be the best approach for doing this???
Question 2:
Also, if you take a look at my scenario, you would have noticed that its SaveAll or SaveNone approach. Initially i tried with looping through DbEntityEntries and then rollbacked the change. But that sounds working for Second approach described above and not for the first approach since i m making a save after my insertion in order to get the TaskID. Then finally i ended up with using "DbConextTransaction" introduced in EF 6. But what is the best approach???
Question 3:
Update Method:
While doing an update , as per my requirement, i will not touch the Task table. It deals with TaskDetail table alone though the task ID would be required which will be passed from the UI.
• Get the existing task detail using the task ID and active flag
(There is one to many relationship)
Update the active flag as false
Create new entry in Task Detail table
I just translated the above statements into code implementation as well but what would be the best approach to handle it????
I am new with Spring, my application, developed with Spring Roo has a Cron that every day download some files and update a database.
The update is done, after downloading and parsing the files, using merge(),
an Entity class Dataset has a list called resources, after the download I do:
dataset.setResources(resources);
dataset.merge();
and dataset.merge() does the following:
#Transactional
public Dataset Dataset.merge() {
if (this.entityManager == null) this.entityManager = entityManager();
Dataset merged = this.entityManager.merge(this);
this.entityManager.flush();
return merged;
}
I expect that doing dataset.setResources(resources); I would overwrite the filed resources, and so even the database entry would be overwritten.
But I get double entries in the database: every resource appear twice, with different IDs (incremental).
How can I succed in let my application doing updates and not insert? A naive solution would be delete manually the old resource and then call merge(); is this the way or is there some more smart solution?
This situation occurs when you use Hibernate as persistence engine and your entities have version field.
Normally the ID field is what we need for merging a detached object with its persistent state in the database, but Hibernate takes the version field in account and if you don't set it (it is null) Hibernate discards the value of ID field and creates a new object with new ID.
To know if you are affected by this strange feature of Hibernate, set a value in the version field, if an Exception is thrown you got it. In that case the best way to solve it is the data to parse contain the right value of version. Another ways are to disable version checking (see Hibernate ref guide to know about it) or load persistent state before merging.
Using oracle database.
Here's how i think the SQLException happens...
Say i have two instances of a service running in parallel. Both of them do the following:
Query cache(B) to see if Person exists there.
If person exists, but out of date OR doesnt exist = do a query on the main database(A).
If Person found in database (A) and NOT found earlier in cache (B). INSERT, else if person was found in cache earlier but was out of date UPDATE cache.
I use the following code to make the decision, based on earlier query to cache B.
void insertOrUpdate(RegistryPersonMo person) {
if (person.getId() == null) {
insertPerson(person);
} else {
updatePerson(person);
}
}
and insert using Spring JDBC:
void insertPerson(RegistryPersonMo person) {
Number id = insertInto("PERSON_REGISTRY", "RAAMAT").usingGeneratedKeyColumns("ID").executeAndReturnKey(usingParameters(person));
if (id != null) {
person.setId(id.longValue());
}
}
The actual problem occurs when two instances of the service have finished querying the cache(B) and the person wasn't found (null). Then one instance does an INSERT, because data did not exist.The other gets SQLException upon trying to do the same, because an entry with a unique constraint already exists.
Does anyone know what the best\standard workaround is? Some ideas i've had:
Lock reading of the row until insert done. Can i do this using Spring?
Use replace or insert with ignore. still learning, are there any downsides to these ?
Bear in mind i'd like to use Spring and automate the query as much as possible..
I think it's fine in this situation just to ignore the unique constraint exception. Yes, this is race condition but the expected one - desired outcome is achieved, record inserted. Perhaps log it to be able to assert how often this is happening.
Locking or transaction serialization would resolve this issue but won't make much sense in this case, in my opinion.